Paper presentations are the heart of a PharmaSUG conference. PharmaSUG 2020 will feature over 200 paper presentations, posters, and hands-on workshops. Papers are organized into 15 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 09-Mar-2020.


Advanced Programming

Paper No. Author(s) Paper Title (click for abstract)
AP-005 Jeffrey Meyers %MVMODELS: a Macro for Survival and Logistic Analysis
AP-007 Richann Watson
& Louise Hadden
Quick, Call the "FUZZ": Using Fuzzy Logic
AP-018 Louise Hadden Like, Learn to Love SAS® Like
AP-022 Jane Eslinger It’s All about the Base—Procedures, Part 2
AP-073 Stephen Sloan
& Kirk Paul Lafler
Fuzzy Matching Programming Techniques Using SAS® Software
AP-083 Brett Jepson Sometimes SQL Really Is Better: A Beginner's Guide to SQL Coding for DATA Step Users
AP-090 Josh Horstman Dating for SAS Programmers
AP-093 Siqi Huang One Macro to create more flexible Macro Arrays and simplify coding
AP-108 Shuqi Zhao Collapsing Daily Dosing Records in NONMEM PopPK Datasets
AP-129 Pradeep Acharya
& Aniket Patil
Shell Script automation for SDTM/ADaM and TLGs
AP-136 Janet Stuelpner Transpose Procedure: Turning it Around Again
AP-218 Vikas Dhongde How I learned Python in 15 days
AP-236 Sathish Saravanan
& Kameswari Pindiprolu
100% Menu Driven SAS Windows Batch Job Runner and Log Analyzer
AP-240 Akhil Vijayan
& Limna Salim
Next Level Programming-Reusability and Importance of Custom Checks
AP-251 Timothy Harrington How to Achieve More with Less Code
AP-255 Mike Molter Python-izing the SAS Programmer 2: Objects, Data Processing, and XML
AP-259 Soujanya Konda
& Vishnu Bapunaidu Vantakula
Data Puzzle Made Easy: CTC Grading
AP-264 Ramki Muthu Hidden Gems in SAS Editor: Old Wine in New Bottle
AP-273 Alina Blyzniuk What About Learning New Languages?
AP-278 Jackie Fitzpatrick SAS Formats: Same Name, Different Definitions FORMAT-ters of Inconvenience
AP-300 Vladlen Ivanushkin Let’s OPEN it. The OPEN function in macro programming.
AP-312 Todd Case
& Charu Shankar
Complex Clinical Trial Coding Challenges: A SAS Coder and a SAS Trainer United Approach
AP-342 Alice Cheng Speeding Up Your Validation Process is As Easy As 1, 2, 3
AP-369 Keith Shusterman
& Mario Widel
Using PROC FCMP to Create Custom Functions in SAS

Applications Development

Paper No. Author(s) Paper Title (click for abstract)
AD-042 Andre Couturier
& Beilei Xu
& Nicolas Dupuis
& Rie Ichihashi
& Patrice Sevestre
Automation of Statistical Reporting deliverables is impossible today
AD-046 Girish Kankipati
& Hao Meng
Normal is Boring, Let’s be Shiny: Managing Projects in Statistical Programming Using the RStudio® Shiny® App
AD-052 Girish Kankipati
& Boxun Zhang
‘This Is Not the Date We Need. Let’s Backdate’: An Approach to Derive First Disease Progression Date in Solid Tumor Trials
AD-055 Jeff Xia
& Shunbing Zhao
A Set of VBA Macros to Compare RTF Files in a Batch
AD-071 Stephen Sloan Using SAS ® to Create a Build Combinations Tool to Support Modularity
AD-075 Clio Wu Clinical Data Standardization and Integration using Data Warehouse and Data Mart Process
AD-076 Masaki Mihaila Let Unix/Linux be on your side
AD-084 Ellen Lin
& Aditya Tella
& Yeshashwini Chenna
& Michael Hagendoorn
Metadata-driven Modular Macro Design for SDTM and ADaM
AD-088 Jeffrey Meyers Demographic Table and Subgroup Summary Macro %TABLEN
AD-102 Jiannan Hu Moving Information to and from aCRF and Seal the eSub Package
AD-106 Igor Goldfarb
& Ella Zelichonok
Macro To Produce Sas®-Readable Table Of Content From Tlf Shells
AD-114 Troy Hughes Chasing Master Data Interoperability: Facilitating Master Data Management Objectives Through CSV Control Tables that Contain Data Rules that Support SAS® and Python Data-Driven Software Design
AD-130 Sara Shoemaker
& Matthew Martin
& Robert Kleemann
& David Costanzo
& Tobin Stelling
A Novel Solution for Converting Case Report Form Data to SDTM using Configurable Transformations
AD-142 Jeffrey Meyers Data Library Comparison Macro %COMPARE_ALL
AD-146 Sara McCallum MKADRGM: A Macro to Transform Drug-Level SDTM Data into Traceable, Regimen-Level ADaM Data Sets
AD-169 Binyang Zhang
& Yejin Huang
Comprehensive Guide: Road Map to Process MedDRA/WHODrug Dictionary Up-Version
AD-171 Craig Chin
& Lawrence Madziwa
Clinical Database Metadata Quality Control: Two Approaches using SAS and Python
AD-199 Robert Macarthur, PharmD, MS, BCSCP Efficiently Prepare GMP Compliant Batch Records for Drug Manufacturing and 503A/503B Pharmacy Compounding, with SAS
AD-203 Benjamin Straub
& Michael Rimler
R4GSK:The Journey to Integrate R Programming into the Clinical Reporting Process
AD-208 Yuping Wu
& Jan Skowronski
Programming Patient Narratives Using Microsoft Word XML files
AD-211 William Wei Semi-Automated and Modularized Approach to Generate Tables for Clinical Study Reports (CSR)
AD-296 Valerie Williams SAS Migration from Unix to Windows and Back
AD-298 Sean Yang
& Hrideep Antony
& Aman Bahl
RStats: A R-Shiny application for statistical analysis
AD-308 Julie Stofel Using Data-Driven Python to Automate and Monitor SAS Jobs
AD-333 Bharath Donthi
& Lingjiao Qi
Integrated P-value summary table for expedited review
AD-341 Mike Stackhouse Opening Doors for Automation with Python and REST: A SharePoint Example
AD-347 Yang Gao A Tool to Organize SAS Programs, Output and More for a Clinical Study

Artificial Intelligence (Machine Learning)

Paper No. Author(s) Paper Title (click for abstract)
AI-014 Jim Box Do you know what your model is doing? How human bias impacts machine learning.
AI-025 Karen Walker Gradient Descent: Using SAS for Gradient Boosting
AI-058 Surabhi Dutta Pattern Detection for Monitoring Adverse Events in Clinical Trials - Using Real Time, Real World Data
AI-061 Kevin Lee How I became a Machine Learning Engineer from Statistical Programmer
AI-224 Mirai Kikawa
& Yuichi Nakajima
How to let Machine Learn Clinical Data Review as it can Support Reshaping the Future of Clinical Data Cleaning Process
AI-242 Roshan Stanly
& Ajith Baby Sadasivan
& Limna Salim
Automate your Safety tables using Artificial Intelligence & Machine Learning
AI-314 Ganeshchandra Gupta SAS® Viya®: The R Perspective
AI-324 Daniel Ulatowski Emergence of Demand Sensing in Value, Access and Reimbursement Pharma Divisions
AI-358 Xuan Sun Identifying Rare Diseases from Extremely Imbalanced Data using Machine Learning

Data Standards

Paper No. Author(s) Paper Title (click for abstract)
DS-012 Cleopatra DeLeon
& Laura Bellamy
Dynamically Harvesting Study Dates to Construct and QC the Subject Visits (SV) SDTM Domain
DS-023 Janet Stuelpner
& Olivier Bouchard
& Mira Shapiro
Data Transformation: Best Practices for When to Transform Your Data
DS-031 Jennifer Fulton Ensuring Consistency Across CDISC Dataset Programming Processes
DS-062 Christine McNichol Untangling the Subject Elements Domain
DS-080 Sumit Pratap Pradhan Standardised MedDRA Queries (SMQs): Programmers Approach from Statistical Analysis Plan (SAP) to Analysis Dataset and Reporting
DS-082 Weiwei Guo Implementation of Immune Response Evaluation Criteria in Solid Tumors (iRECIST) in Efficacy Analysis of Oncology Studies
DS-109 Lyma Faroz
& Jinit Mistry
Impact of WHODrug B3/C3 Format on Coding of Concomitant Medications
DS-110 Lyma Faroz
& Sruthi Kola
Demystifying SDTM OE, MI, and PR Domains
DS-117 Kuldeep Sen
& Sumida Urval
& Yang Wang
CDISC-compliant Implementation of iRECIST and LYRIC for Immunomodulatory Therapy Trials
DS-133 Kapila Patel
& Nancy Brucken
Is Your Dataset Analysis-Ready?
DS-195 Charumathy Sreeraman Standardizing Patient Reported Outcomes (PROs)
DS-196 Sowmya Srinivasa Mukundan
& Charumathy Sreeraman
Simplifying PGx SDTM Domains for Molecular biology of Disease data (MBIO).
DS-226 Donna Sattler
& Sharon Hartpence
From the root to branches, how Standards Teams utilize Decision Trees to accept or reject Standard Change Requests.
DS-235 Song Liu
& Cindy Song
& Mijun Hu
& Jieli Fang
Tackle Oncology Dose Intensity Analysis from EDC to ADaM
DS-248 Amy Garrett Best practices for annotated CRFs
DS-256 Mike Molter A codelist’s journey from the CDISC Library to a study through Python
DS-261 Sergiy Sirichenko SUPPQUAL datasets: good bad and ugly
DS-285 Pravinkumar Doss Know the Ropes - Standard for Exchange of Non-clinical Data (SEND)
DS-295 Bhargav Koduru
& Andi Dhroso
& Tanya Teslovich
Get Ready! Personalized Medicine is here and so is the data.
DS-306 Alyssa Wittle
& Jennifer McGrogan
SDTM to ADaM Programming: Take the Leap!
DS-318 Maddy Wilks
& Sara Warren
Blood, Bacteria, and Biomarkers: Understanding the scientific background behind human clinical lab tests to aid in compliant submissions
DS-323 Jerry Salyers Considerations in Implementing the CDASH Model v1.1 and the CDASH Implementation Guide v2.1
DS-329 Soumya Rajesh
& Michael Wise
Overcoming Pitfalls of DS: Shackling 'the Elephant in the Room'
DS-339 Lex Jansen Extracting Metadata from the CDISC Library using SAS with PROC LUA
DS-344 Fred Wood Trial Sets in Human Clinical Trials
DS-351 Lynn Mullins
& Ajay Gupta
Have You Heard of the TD Domain? It Might Not be What You Think it is.
DS-365 Sunil Gupta Creating SDTMs and ADaMs CodeList Lookup Tables
DS-368 Mario Widel
& Henry Winsor
Good versus Better SDTM – Some Annoying Standard Dictionary Issues

Data Visualization and Reporting

Paper No. Author(s) Paper Title (click for abstract)
DV-004 Jeffrey Meyers Library Datasets Summary Macro %DATA_SPECS
DV-006 Richann Watson What's Your Favorite Color? Controlling the Appearance of a Graph
DV-009 Richann Watson Great Time to Learn GTL: A Step-by-Step Approach to Creating the Impossible
DV-029 Sean Lacey A Templated System for Interactive Data Visualizations
DV-050 Taniya Muliyil Oncology Graphs-Creation (Using SAS and R), Interpretation and QA
DV-057 Hao Meng
& Yating Gu
& Yeshashwini Chenna
R for Clinical Reporting, Yes – Let's Explore It!
DV-066 Xiangchen (Bob) Cui
& Sri Pavan Vemuri
Simplifying the Derivation of Best Overall Response per RECIST 1.1 and iRECIST in Solid Tumor Clinical Studies
DV-132 Xingshu Zhu
& Bo Zheng
Automation of Flowchart using SAS
DV-135 Huei-Ling Chen
& William Wei
Making Customized ICH Listings with ODS RTF
DV-148 Raghava Pamulapati Butterfly Plot for Comparing Two Treatment Responses
DV-157 Shunbing Zhao
& Jeff Xia
Automating of Two Key Components in Analysis Data Reviewer’s Guide
DV-158 Min Xia Enhanced Visualization of Clinical Pharmacokinetics Analysis by SAS GTL
DV-163 Yida Bao
& Zheran Rachel Wang
& Jingping Guo
& Philippe Gaillard
plots and story with diabetes data
DV-164 Radhika Etikala
& Xuehan Zhang
Using R Markdown to Generate Clinical Trials Summary Reports
DV-166 David Kelley An Introduction to the ODS Destination for Word
DV-191 Jing Pan Early phase data visualization: how to get ahead in the game
DV-193 Bill Coar Subset Listings: An Application of Item Stores
DV-198 Siruo Wang
& Keaven Anderson
& Yilong Zhang
r2rtf – an R Package to Produce Rich Text Format Tables and Figures
DV-204 Benjamin Straub Static to Dynamic: A one language Programmer to a multi-lingual Programmer with RShiny in Six Weeks
DV-214 Haihua Kan
& Ziying Chen
Producing Lab Shift Tables for Oncology Study
DV-215 Ziying Chen
& Haihua Kan
Producing a Swimmer Plot for TEAE by using GTL
DV-227 Samundeeswari Raja Safety Reports using PROC STREAM
DV-252 Soujanya Konda A Sassy substitute to represent the longitudinal data – The Lasagna Plot
DV-282 Kishore Kumar Sundaram Hierarchical Data in Clinical Trials - The Need for Visualization and Possible Solutions
DV-283 Chandana Sudini
& Bindya Vaswani
Programming Technique for Line Plots with Superimposed Data Points
DV-294 Yan Li Common SAS Tips for Patient Profile Programming
DV-299 Shuozhi Zuo
& Hong Yan
Effective Exposure-Response Data Visualization and Report by Combining the Power of R , SAS programming and VBScript
DV-350 Louise Hadden Dressing Up your SAS/GRAPH® and SG Procedural Output with Templates, Attributes and Annotation

Hands-On Training

Paper No. Author(s) Paper Title (click for abstract)
HT-091 Josh Horstman Getting Started Creating Clinical Graphs in SAS with the SGPLOT Procedure
HT-100 Vince DelGobbo Creating Custom Microsoft Excel Workbooks Using the SAS® Output Delivery System, Part 1
HT-111 Troy Hughes YO.Mama is Broke 'Cause YO.Daddy is Missing: Autonomously and Responsibly Responding to Missing or Invalid SAS® Data Sets Through Exception Handling Routines
HT-139 Deanna Schreiber-Gregory
& Peter Flom
Why You are Using PROC GLM Too Much (and What You Should Be Using Instead)
HT-143 Kevin Lee Hands-on Training on NLP Machine Learning Programming
HT-375 Sanjay Matange Build Popular Clinical Graphs using SAS
HT-376 Phil Bowsher
& Sean Lopp
Generating Data Analysis Reports with R Markdown
HT-377 Charu Shankar Step-by-step SQL Procedure

Leadership Skills

Paper No. Author(s) Paper Title (click for abstract)
LS-008 Richann Watson
& Louise Hadden
Are you Ready? Preparing and Planning to Make the Most of your Conference Experience
LS-016 Carey Smoak One Boys’ Dream: Hitting a Homerun in the Bottom of the Ninth Inning
LS-037 Jeff Xia Microsoft OneNote: A Treasure Box for Managers and Programmers
LS-044 Kala Shivalingaiah Leading a successful team through complex environment
LS-059 Himanshu Patel
& Jeff Xia
An Effective Management Approach for a First-Time Study Lead
LS-118 Scott Burroughs Rethinking Programming Assignments
LS-120 Jim Baker Inspirational Leadership - The Infinite Versus Finite Approach
LS-123 Priscilla Gathoni Schoveing Series 5: Three Question Marks: Your Guide to a Quality NO
LS-131 Frank Menius Live Fire Exercises: Observations of a Salty Field Training Officer on How to Better Train and Retain Quality Programmers
LS-162 Clio Wu Which Side of the Fence Are You on? A Head of Statistical Programming’s perspectives working for Sponsor and CRO
LS-185 Siva Ramamoorthy Leadership Lessons from Start-ups
LS-263 Charan Kumar Kuyyamudira Janardhana Theory of P and P - Personalized and Predictive Human Resource Management – Evolving face of HRM
LS-265 Ravi Kankipati
& Prasanna Sondur
Building a Strong Remote Working Culture – Statistical Programmers Viewpoint
LS-272 Richard DAmato
& Ajay Gupta
Leadership Skills: Tips to Retain and Motivate Millennials at Work
LS-297 Darpreet Kaur The Art of Work Life Balance.
LS-309 Aravind Gajula Leadership from the point of view of individual contributor
LS-320 David Polus For My Next Act… Leveraging One’s Skills to Start Anew in Pharma / Biotech
LS-322 Amanda Truesdale
& Bill Donovan
Strategies to building, growing and maintaining a high-performance global statistical programming team
LS-330 Kriss Harris Leadership and Programming
LS-359 Janette Garner Leading without Authority: Leadership At All Levels
LS-364 Jian Hua (Daniel) Huang
& Rajan Vohra
& Andy Chopra
Project Metrics- a powerful tool that supports workload management and resource planning for Biostats & Programming department.
LS-373 Kirsty Lauderdale
& Paul Slagle
Managing Transitions so Your Life is Easier

Medical Devices

Paper No. Author(s) Paper Title (click for abstract)
MD-020 Carey Smoak CDISC Standards for Medical Devices: Historical Perspective and Current Status
MD-041 Phil Hall Successful US Submission of Medical Device Clinical Trial using CDISC
MD-360 Shilpakala Vasudevan An overview of medical device data standards and analytics

Quick Tips

Paper No. Author(s) Paper Title (click for abstract)
QT-035 Alex Karanevich
& Michael Ames
A SAS Macro for Calculating Confidence Limits and P-values Under Simon’s Two-Stage Design
QT-038 Jianguo Li Programmer guild for missing data handling
QT-049 Alex Ostrowski PROC COMPARE: Misnomer of the statement “NOTE: No unequal values were found. All values compared are exactly equal.”
QT-127 Li Liu Remove Strikethrough Texts from Excel Documents by VBA Macro
QT-128 Yongjiang (Jerry) Xu
& Yanhua (Katie) Yu
A Solution to Look-Ahead Observations
QT-167 Lilyana Gross Transpose Data from Wide to Long With Output Statements
QT-178 Jianlin Li Python applications in clinical data management
QT-183 Ajay Sinha
& M Chanukya Samrat
A Brief Understanding of DOSUBL beyond CALL EXECUTE
QT-194 Bill Coar Addressing (Non) Repeating Order Variables in Proc Report
QT-209 Sachin Aggarwal
& Sapan Shah
Automation of Conversion of SAS Programs to Text files
QT-213 Manohar Modem
& Bhavana Bommisetty
A SAS Macro for Dynamic Assignment of Page Numbers
QT-249 Noory Kim Text Wrangling with Regular Expressions: A Short Practical Introduction
QT-253 Timothy Harrington Implementing a LEAD Function for Observations in a SAS DATA Step
QT-280 Pratap Kunwar A Macro to Add SDTM Supplemental Domain to Standard Domain
QT-289 Abhinav Srivastva Highlight changes: An extension to PROC COMPARE
QT-313 Ray Pass PROC REPORT – Land of the Missing OBS Column
QT-315 Yuping Wu
& Sayeed Nadim
A SAS macro for tracking the Status of Table, Figure and Listing (TFL) Programming
QT-345 Sarbani Roy What are PRX Functions and Call Routines and examples of their application in Clinical SAS Programming
QT-349 Sachin Aggarwal
& Sapan Shah
Macro for controlling Page Break options for Summarizing Data using Proc Report
QT-354 Taylor Markway Updates to A Macro to Automatically Flag Baseline SDTM

Real World Evidence and Big Data

Paper No. Author(s) Paper Title (click for abstract)
RW-053 Jayanth Iyengar NHANES Dietary Supplement component: a parallel programming project
RW-113 Troy Hughes Better to Be Mocked Than Half-Cocked: Data Mocking Methods to Support Functional and Performance Testing of SAS® Software
RW-161 Zhouming(Victor) Sun Visualization of Big Data Generating from Real-Time Monitoring of Clinical Trial
RW-192 Tabassum Ambia Natural History Study – A Gateway to Treat Rare Disease
RW-275 Valentina Aguilar Simple uses of PROC CLUSTER: Pruning big datasets and identifying data issues
RW-319 Irene Cosmatos
& Michael Bulgrien
Standardizing Laboratory Data From Diverse RWD to Enable Meaningful Assessments of Drug Safety and Effectiveness
RW-372 Cara Lacson
& Carol Matthews
Life After Drug Approval… What Programmers Need to Know About REMS

Software Demonstrations (Tutorials)

Paper No. Author(s) Paper Title (click for abstract)
SD-121 Vince DelGobbo Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You
SD-281 Chris Hardwick
& Justin Slattery
& Hans Gutknecht
A Single, Centralized, Biometrics Team Focused Collaboration System for Analysis Projects

Statistics and Analytics

Paper No. Author(s) Paper Title (click for abstract)
SA-013 Marina Komaroff
& Sandeep Byreddy
Diaries and Questionnaires: Challenges and Solutions
SA-021 Bob Matsey Agile Technology Integration Enables Analytic Development & Speed to Production
SA-034 Deanna Schreiber-Gregory A Doctor's Dilemma: How Propensity Scores Can Help Control For Selection Bias in Medical Education
SA-051 Girish Kankipati
& Chia-Ling Ally Wu
Calculation of Cochran–Mantel–Haenszel Statistics for Objective Response and Clinical Benefit Rates and the Effects of Stratification Factors
SA-056 Jiannan Kang ADaM Implementation of Immunogenicity Analysis Data in Therapeutic Protein Drug Development
SA-072 Stephen Sloan
& Kevin Gillette
Assigning agents to districts under multiple constraints using PROC CLP
SA-103 Qiuhong Jia
& Fang-Ting Kuo
& Chia-Ling Ally Wu
& Ping Xu
Risk-based and Exposure-based Adjusted Safety Incidence Rates
SA-112 Troy Hughes
& Louise Hadden
Should I Wear Pants? And Where Should I Travel in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures
SA-147 Christopher Brown
& Christine Campbell
Programmatic method for calculating Main Effect Interaction p-value in SAS 9.4 using GLMMOD to create Indicator Variables and associated Design Matrix.
SA-149 Giulia Tonini
& Letizia Nidiaci
& Simona Scartoni
Sample size and HR confidence interval estimation through simulation of censored survival data controlling for censoring rate
SA-207 Erica Goodrich
& Daniel Sturgeon
A Brief Introduction to Performing Statistical Analysis in SAS, R & Python
SA-225 Andrea Nizzardo
& Giovanni Marino Merlo
& Simona Scartoni
Principal Stratum strategy for handle intercurrent events: a causal estimand to avoid biased estimates
SA-234 Krishnakumar K P
& Aswathy S
Bayesian Based Dose Escalation Clinical Trial Designs
SA-243 Shuqi Zhao Imputation for Missing Dosing Time in NONMEM PopPK Datasets
SA-262 Kevin Venner
& Jennifer Ross
& Kyle Huber
& Noelle Sassany
& Graham Nicholls
Using SAS Simulations to determine appropriate Block Size for Subject Randomization Lists
SA-284 Steven Gilbert Implementing Quality Tolerance Limits at a Large Pharmaceutical Company
SA-370 Vidhya Parameswaran-Iyer A Visual Guide to Selecting Appropriate Matching Solutions in Epidemiology

Strategic Implementation

Paper No. Author(s) Paper Title (click for abstract)
SI-019 Tho Nguyen
& Ken Pikulik
Affordably Accessing Bulk Cloud Storage for Your Analytics
SI-060 Wayne Zhong Doing Less Work for Better Results - Process Optimization in DTLG Production
SI-170 Kobie O'Brian
& Sara Shoemaker
& Robert Kleemann
& Kate Ostbye
Moving A Hybrid Organization Towards CDISC Standardization
SI-173 Amy Gillespie
& Susan Kramlik
& Suhas Sanjee
PROC Future Proof;
SI-205 Homer Wang Redefine The Role of Analysis Programming Lead in Modern Clinical Trials
SI-206 Amber Randall
& Bill Coar
Assessing Performance of Risk-based Testing
SI-230 Yevgeniy Telestakov
& Viktoriia Telestakova
& Andrii Klekov
Be proud of your Legacy - Optimized creation of Legacy datasets
SI-241 Vaughn Eason
& Jake Gallagher
You down to QC? Yeah, You know me!
SI-250 Patricia Guldin
& Jing Su
Single Database for Pharmacometric and Clinical Data
SI-374 Paul Slagle
& Eric Larson
Automating Clinical Process with Metadata

Submission Standards

Paper No. Author(s) Paper Title (click for abstract)
SS-030 Joy Zeng
& Varaprasad Ilapogu
& Xinping Cindy Wu
End-to-end Prostate-Specific Antigen (PSA) Analysis in Clinical Trials: From Mock-ups to ADPSA
SS-045 Lucas Du
& William Paget
& Lingyun Chen
& Todd Case
Updates in SDTM IG V3.3: What Belongs Where – Practical Examples
SS-054 Shefalica Chand
& Eric Song
RTOR: Our Side of the Story
SS-081 Sandra Minjoe Why Are There So Many ADaM Documents, and How Do I Know Which to Use?
SS-094 Girish Rajeev Implementation of Metadata Repository & Protocol Automation at Bayer Pharma
SS-095 Michael Hagendoorn
& Ellen Lin
Masters of the (Data Definition) Universe: Creating Data Specs Consistently and Efficiently Across All Studies
SS-097 Jinit Mistry
& Lyma Faroz
& Hao Meng
Data Review: What’s Not Included in Pinnacle 21?
SS-134 Chintan Pandya
& Majdoub Haloui
Essential steps to avoid common mistakes when providing BIMO Data Package
SS-140 Ajay Gupta Pinnacle 21 Community v3.0 - A Users Perspective
SS-150 Akari Kamitani
& HyeonJeong An
& Yura Suzuki
& Malla Reddy Boda
& Yoshitake Kitanishi
Challenges and solutions for e-data submission to PMDA even after submission to FDA
SS-151 Majdoub Haloui
& Hong Qi
Supplementary Steps to Create a More Precise ADaM define.xml in Pinnacle 21 Enterprise
SS-156 Abhilash Chimbirithy
& Saigovind Chenna
& Majdoub Haloui
Analysis Package e-Submission – Planning and Execution
SS-159 Hema Muthukumar
& Kobie O'Brian
Automating CRF Annotations using Python
SS-160 Sarada Golla Submission Standards: A SAS Macro approach for submission of software programs(*.sas)
SS-197 Elizabeth Li
& Carl Chesbrough
& Inka Leprince
Preparing a Successful BIMO Data Package
SS-200 David Izard Machine Readable Data Not Required for EMA… Really?
SS-287 Trevor Mankus Looking Back: A Decade of ADaM Standards
SS-291 Michael Beers Confusing Data Validation Rules Explained, Part 2
SS-292 Savithri Jajam Split Character Variables into Meaningful Text
SS-317 Ji Qi
& Yan Li
& Lixin Gao
Improving the Quality of Define.xml: A Comprehensive Checklist Before Submission
SS-321 Bhargav Koduru No more a trial by fire! Trial summary made easy!!
SS-325 Kristin Kelly Getting It Right: Refinement of SEND Validation Rules
SS-327 Lingjiao Qi
& Bharath Donthi
Step-by-step guide for a successful FDA submission
SS-371 Hariprasath Narayanaswamy Deciphering Codelists - Better Understanding and Handling of Controlled Terminology Metadata in Define.xml and in Reviewer’s Guides


Paper No. Author(s) Paper Title (click for abstract)
EP-064 Azia Tariq
& Janaki Chintapalli
A Guide for the Guides: Implementing SDTM and ADaM standards for parallel and crossover studies
EP-099 Charley Wu Color Data Listings and Color Patient Profiles
EP-168 Zhouming(Victor) Sun One Graph to Simplify Visualization of Clinical Trial Projects
EP-172 Nancy Brucken
& Peter Schaefer
& Dante Di Tommaso
TDF – Overview and Status of the Test Data Factory Project, Standard Analyses & Code Sharing Working Group
EP-174 Nancy Brucken
& Dante Di Tommaso
& Jane Marrer
& Mary Nilsson
& Jared Slain
& Hanming Tu
Standard Analyses and Code Sharing Working Group Update
EP-175 Jane Lu Use of Cumulative Response Plots in Clinical Trial Data
EP-176 Yuichi Nakajima 10 things you need to know about PMDA eSubmission
EP-177 Thu Dinh
& Goutam Chakraborty
Detecting Side Effects and Evaluating the Effectiveness of Drugs from Customers’ Online Reviews using Text Analytics, Sentiment Analysis and Machine Learning Models
EP-237 Jianwei Liu CTCAE up-versioning – a simple way to deal with the complexity of lab toxicity grading
EP-268 Jayanth Iyengar
& Josh Horstman
Look Up Not Down: Advanced Table Lookups in Base SAS
EP-270 Nirmal Balasubramanian
& Praveenraj Mathivanan
My Favorite SAS® Tips, Tricks and Techniques
EP-337 Vipin Kumpawat
& Lalitkumar Bansal
Generating ADaM compliant ADSL Dataset by Using R
EP-353 Louise Hadden Visually Exploring Proximity Analyses Using SAS® PROC GEOCODE and SGMAP and Public Use Data Sets
EP-355 Bob Matsey Integrating Your Analytics In Database with SAS, Hadoop and other data types in Teradata EDW using Agile Analytics to analyze many different data types.
EP-362 Bhanu Bayatapalli SDSP: Sponsor and FDA Liaison
EP-367 Danyang Bing
& Randi McFarland
Confirmation of Best Overall Tumor Response in Oncology Clinical Trials per RECIST 1.1


Advanced Programming

AP-005 : %MVMODELS: a Macro for Survival and Logistic Analysis
Jeffrey Meyers, Mayo Clinic

The research field of clinical oncology heavily relies on the methods of survival analysis and logistic regression. Analyses involve one or more variables within a model, and multiple models are often compared within subgroups. Results are prominently displayed within either a table or graphically with a forest plot. The MVMODELS macro performs every step for a univariate or multivariate analysis: running the analysis, organizing the results into datasets for printing or plotting, and creating the final output as a table or graph. MVMODELS is capable of running and extracting statistics from multiple models at once, performing subgroup analyses, outputting to most file formats, and contains a large variety of options to customize the final output. The macro MVMODELS is a powerful tool for analyzing and visualizing one or more statistical models.

AP-007 : Quick, Call the "FUZZ": Using Fuzzy Logic
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

SAS® practitioners are frequently called upon to do a comparison of data between two different data sets and find that the values in synonymous fields do not line up exactly. A second quandary occurs when there is one data source to search for particular values, but those values are contained in character fields in which the values can be represented in myriad different ways. This paper discusses robust, if not warm and fuzzy, techniques for comparing data between, and selecting data in, SAS data sets in not so ideal conditions.

AP-018 : Like, Learn to Love SAS® Like
Louise Hadden, Abt Associates Inc.

How do I like SAS®? Let me count the ways.... There are numerous instances where LIKE or LIKE statements can be used in SAS - and all of them are useful. This paper will walk through such uses of LIKE as: searches and joins with that smooth LIKE operator (and the NOT LIKE operator); the SOUNDS LIKE operator; using the LIKE condition to perform pattern-matching and create variables in PROC SQL; and PROC SQL CREATE TABLE LIKE to create empty data sets with appropriate metadata.

AP-022 : It’s All about the Base—Procedures, Part 2
Jane Eslinger, SAS Institute

“It’s All about the Base—Procedures” (a PharmaSUG 2019 paper) explored the strengths and challenges of commonly used Base SAS® procedures. It also compared each procedure to others that could accomplish similar tasks. This paper takes the comparison further, focusing on the FREQ, MEANS, TABULATE, REPORT, PRINT, and SQL procedures. As a programmer, whether novice or advanced, it is helpful to know when to choose which procedure. The first paper provided best-use cases, and this paper takes it a step further in its discussion of when to choose one procedure over another. It also provides example code to demonstrate how to get the most out of the procedure that you choose.

AP-073 : Fuzzy Matching Programming Techniques Using SAS® Software
Stephen Sloan, Accenture
Kirk Paul Lafler, Software Intelligence Corporation

Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. Two data files can be joined without a problem when they have identifiers with unique values. However, many files do not have unique identifiers, or “keys”, and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates data sets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or “messy” data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.

AP-083 : Sometimes SQL Really Is Better: A Beginner's Guide to SQL Coding for DATA Step Users
Brett Jepson, Rho Inc.

Structured Query Language (SQL) in SAS® provides not only a powerful way to manipulate your data, it enables users to perform programming tasks in a clean and concise way that would otherwise require multiple DATA steps, SORT procedures, and other summary statistical procedures. Often, SAS users use SQL for only specific tasks with which they are comfortable. They do not explore its full capabilities due to their unfamiliarity with SQL. This presentation introduces SQL to the SQL novice in a way that attempts to overcome this barrier by comparing SQL with more familiar DATA step and PROC SORT methods, including a discussion of tasks that are done more efficiently and accurately using SQL and tasks that are best left to DATA steps.

AP-090 : Dating for SAS Programmers
Josh Horstman, Nested Loop Consulting

Every SAS programmer needs to know how to get a date... no, not that kind of date. This paper will cover the fundamentals of working with SAS date values, time values, and date/time values. Topics will include constructing date and time values from their individual pieces, extracting their constituent elements, and converting between various types of dates. We'll also explore the extensive library of built-in SAS functions, formats, and informats for working with dates and times using in-depth examples. Finally, you'll learn how to answer that age-old question... when is Easter next year?

AP-093 : One Macro to create more flexible Macro Arrays and simplify coding
Siqi Huang, Boehringer Ingelheim

The purpose of using macro array is to make it easier to repetitively execute SAS code. Macro array is defined as a list of macro variables sharing the same given prefix and a numeric suffix, such as A1, A2, A3, etc., plus an additional macro variable with a suffix of “N” containing the length of the array. In this paper, I will introduce a %MAC_ARRAY macro, which provides a more flexible way to create a macro array from any given list of values, or from any selected variable in a dataset, either character or numeric. The application of this macro array is broad as well, including but not limited to: 1) creating similar datasets; 2) stacking multiple datasets; 3) repeating same calculation among multiple variables in the datasets; 3) automatically updating parameters used in other macros. To sum up, %MAC_ARRAY macro can easily keep your code neat and improve program efficiency.

AP-108 : Collapsing Daily Dosing Records in NONMEM PopPK Datasets
Shuqi Zhao, Merck & Co., Inc.

Dosing history data plays an important role in NONMEM-ready population Pharmacokinetics (PopPK) dataset. In late stage clinical trials, patients may take daily doses for up to or over a year. Incorporating such a long dosing history can make the NONMEM dataset extremely large considering the number of patients and duration of dosing. In order to produce a compact dataset, one way is to collapse daily dosing records into combined records using NONMEM reserved variables: ADDL (additional doses) and II (dosing interval). If the same dosage of drug has been taken at the same time every day consecutively by a patient, then these daily dosing records can be collapsed into one record with start dosing datetime, dosage level, ADDL and II (which is 24 hours for QD). Possible deviations need to be taken into consideration such as dose delay, dose reductions or dose interruptions, and be preserved and reflected when collapsing daily dosing records. This paper presents a flexible and adaptable programming approach using SAS DATA STEP to collapse daily dosing data into combined dosing records while considering dose amount, dosing time and reflecting dose interruptions. Examples in this paper focus on collapsing daily dosing records for the use of NONMEM modeling/analysis, but the programming approach discussed in this paper can also be adapted for other dosing regimen (such as BID) and for other analysis datasets if suitable.

AP-129 : Shell Script automation for SDTM/ADaM and TLGs
Pradeep Acharya, Ephicacy Lifescience Analytics
Aniket Patil, Pfizer Inc

In clinical trials, every study has multiple datasets (SDTM/ADaM) and several number of tables, listings and figures (TLFs). Throughout the study, there are many instances where you are required to re-run the datasets due to new incoming data or updates in key datasets such as subject level dataset (ADSL), which results in refresh of all other datasets and TLFs to maintain timestamp consistency. One of the approaches is to run the datasets and TLFs one after the other consuming a significant amount of time for program execution. Another commonly used method is to manually create an executable batch script file that includes all the program names. The drawback in these is you can inadvertently miss any of the dataset/TLG program execution due to manual error, and it is cumbersome to go through the logs and outputs to identify which program has been missed. To overcome such situations and reduce repetitive tasks, we have developed an automated program which creates the shell script by reading the program names directly from the designated study folder for all the datasets and/or TLGs, thereby eliminating any chances of missing any program. This program also provides flexibility to decide the order in which your programs will be run as is generally the case with SDTM/ADaM programs. If there is one to one relation between programs and output for TLFs more functionality can be added to cross check the numbers and confirm a smooth run. Having this facility frees up programmer time for more productive work.

AP-136 : Transpose Procedure: Turning it Around Again
Janet Stuelpner, SAS

In the life science industry, CDISC standards dictate that we keep the data for several domains in a vertical format. It is very efficient to have this format to store the data in this way as there is less waste of space. In order to create our tables, listings and figures, we need to transform or transpose the data into a horizontal format. This is a more efficient way to analyze the data. There are many ways in which this can be done. The purpose of the Transpose Procedure is to reshape the data so that it can be stored as needed and then analyzed in the easiest way possible. PROC TRANSPOSE is the easiest and most complex procedure in SAS. It has only has five options. This paper will revisit how to change the format of the data. You will be taken from the easiest way of doing the transformation without any options to a more complex manner using many options.

AP-218 : How I learned Python in 15 days
Vikas Dhongde, Accenture Solutions Pvt. Ltd.

With the advent of Big Data and analytics few years back, programming languages supporting such number crunching, data transformation and visualization gained much of their stardom that they enjoy today. That’s when erstwhile dormant (as I was too obsessed with SAS) yet curious programmer in me woke up to take on these new challenges. Naturally first choice was Python as it being a programmer’s language as opposed to R. As a working professional and minimal time to spare during weekdays I set an ambitious target of 15 days including weekends. With sheer zeal for learning something new, I embarked upon a journey to see this new language and to check how it fares against my favorite i.e. SAS. This paper presents my experience of learning a whole new language, my strategy and conclusions with the hope that it would help budding programmers in their pursuit of similar goals.

AP-236 : 100% Menu Driven SAS Windows Batch Job Runner and Log Analyzer
Sathish Saravanan, Quartesian Clinical Research
Kameswari Pindiprolu, Quartesian Clinical Research

SAS is the widely used software in clinical research industries for clinical data management, analysis & reporting. Every clinical study requires numerous SAS programs to be developed for generating Datasets (D), Tables (T), Listings (L) and Graphs (G). The program rerunning is evident due to programming specification changes for DTLG. At the same time for each and every rerun, it is mandatory to ensure the log of every submitted program is free from SAS defined potential messages. Various methods have been discussed about bulk rerunning of programs and analysing the log and none of them is menu driven approach. Companies are investing so much of money to develop and maintain such a menu driven tool that performs bulk rerunning of the SAS programs along with the log analysis. The purpose of this paper is to provide a Microsoft Excel VBA based tool that is 100% Menu Driven to perform bulk running of the SAS programs along with Log analysis, process summary and easily accessing the particular log file from the several files. It also creates a SAS program file with all the selected programs to be used for the later use. The proposed approach needs only 4 inputs from the user as follows. 1. Program Location 2. Log location for the log files to be saved 3. LST output location for the LST output file to be save 4. Selection of the programs to run

AP-240 : Next Level Programming-Reusability and Importance of Custom Checks
Akhil Vijayan, Genpro Research Inc.
Limna Salim, Genpro Life Sciences

SDTM-domain structures and relationships are similar across studies under a therapeutic area which leads to code standardization and reusability especially within ISS/ISE submissions. Interim data transfers also come with changes in data leading to rerun of existing programs with minor updates. The possibility of errors in such scenarios are large with truncation in data, new data issues being unidentified, attribute changes etc. This paper details the importance of using standard macros and alternative programming approaches like enabling custom checks/warnings that makes reusability of programs a much smoother process. Not all Data Issues are identified at the initial stage of Source Data Validation, but they tend to surface during the development of CDISC datasets. In addition, certain data issues identified at the initial run might not always be necessarily resolved in the next data transfer. This is where custom errors / warnings play a significant role. Similarly, in case of Statistical programming, a standard code might be replicated for various TFLs with changes only to the parameters considered. In such cases, specific custom checks based on parameters also comes into importance. This paper discusses various situations with examples where user defined errors/warnings can be implemented, like -Validation of subject included alongside DM data -Verifying the length of source variable (ensuring data after 200 characters are successfully mapped to SUPP) -Verifying the baseline flags populated after derivation The paper also discusses the need for standard macros which support custom checks like, macros for -Source data variable length check -Automating formats -Attribute generation

AP-251 : How to Achieve More with Less Code
Timothy Harrington, Navitas Data Sciences

One of the goals of a SAS Programmer should be to achieve a required result with the minimal amount of code. The two reasons for "code complexity" are: one, too many lines of code, and two: code which is unduly difficult to decipher, for example a large number of nested operations, pairs of parentheses, operators, and symbols packed closely together. This paper describes methods for reducing the amount and complexity of SAS code, and for avoiding repetition of code. Included are examples of SAS code using v9.2 or later functions such as IFN, IFC, CHOOSE, WHICH, and LAG. This content and discussion is primarily intended for beginner and intermediate SAS users.

AP-255 : Python-izing the SAS Programmer 2: Objects, Data Processing, and XML
Mike Molter, PRA Health Sciences

As a long-time SAS programmer curious about what other languages have to offer, I cannot deny that the leap from SAS to the object-oriented world is not a small one to be taken lightly. Anyone looking for superficial differences in syntax and keywords will soon see that something more fundamental is at play. Have no fear though, for languages such as Python have plenty of similarities to give the SAS programmer a strong base of knowledge from which to start their education. In this sequel to an earlier paper I wrote, we will explore Python approaches to programming tasks common in our industry, taking every opportunity to expose their similarities to SAS approaches. After an introduction to objects, we’ll see the many ways that Python can manipulate data, all of which will look familiar to SAS programmers. With a solid working knowledge of objects, we’ll then see how easy object-oriented programming makes the generation of common industry XML. This paper is intended for SAS programmers of all levels with a curiosity about, and an open mind to something slightly beyond our everyday world.

AP-259 : Data Puzzle Made Easy: CTC Grading
Soujanya Konda, GCE Solutions Pvt Ltd
Vishnu Bapunaidu Vantakula, Novartis

Clinical Data analysis is an important milestone in achieving drug approval. Grading of the Lab/Adverse Event data plays a pivotal role in classifying the results and carving a shape to data interpretation. Though the conversion of lab results to International system of units (SI) is little messy, the conversion yields fruitful results in the lab toxicity grades. There are certain obstacles in grading the data due to similar reference limits applicable for multiple grading for the same lab test. The unidirectional/ bidirectional journey of lab data results in numerous grades from negative toxicity to positive toxicity helps us understand the shift of results, which helps in data analysis. This paper emphasis the conversion of lab data in SI units, introduction to CTC grading, strategies for assigning the lab grading, and various methods to overcome few challenges.

AP-264 : Hidden Gems in SAS Editor: Old Wine in New Bottle
Ramki Muthu, Senior SAS Programmer

Clinical SAS programmers tend to spend more than 90% of their work time using the SAS editor, and it is imperative to maximize the available functioanlities for their routine task. While most of the programmers are already familiar handling the SAS editor, and the topic was reviewed earlier, this paper attempts to hunt some hidden (or under utilized) features in enhanced SAS editor window for Windows. Briefly, starting with few common etiquettes this paper discuses: Summary of keyboard shortcuts that programmers should be aware, Window management to make a split view of a program and utilizing tile vertical options, Using command bars to display required variables from a dataset, subsetting the records and precisely moving to the desired record (e.g. from 1st to 28474th record) using the command prompt. With few examples, it also discusses how we could utilize keyboard macros for common tasks and to learn new (or unfamiliar) SAS procedures.

AP-273 : What About Learning New Languages?
Alina Blyzniuk, Quartesian

In the modern world, everyone is trying to be better and one of the points is learning new languages. Someone does this for development and someone to improve communication in different countries. In our daily work of creating datasets and reports, we do not always think about the fact that there are unusual ways of solving problems that can make our life easier and also raise the level of knowledge in our work area. So why not discover alternative approaches to use in SAS? This paper gives a brief overview of the built-in languages in SAS and which of them can be used in the procedural step and which ones should be installed. These languages are Macro, SQL, FEDSQL, DS2, IML, GTL and SCL. Each of them has its advantages, strengths, interesting opportunities that we can use in different tasks.

AP-278 : SAS Formats: Same Name, Different Definitions FORMAT-ters of Inconvenience
Jackie Fitzpatrick, SCHARP

It’s always best to catch data discrepancies early in the analysis, especially with the format library. Background: Our organization receives behavioral questionnaire data from outside sources as SPSS sav files. Typically, they use question number as their variable name. Now, this seems like a great idea, but question number one in section B at Baseline may not be the same for the following study visits. For example, the question could have 1 as Yes, 2 as No, but the following visits use 1 as Once, 2 as Twice, and 3 as Three or More To find discrepancies between formats in a quick and efficient manner, my solution consists of two programs: one that finds discrepancies and creates an Excel spreadsheet to show them with the second program helping you fix the discrepancies in a proficient manner. In conclusion, this process will save time, resources, and hair pulling.

AP-300 : Let’s OPEN it. The OPEN function in macro programming.
Vladlen Ivanushkin, Cytel Inc.

Every SAS programmer knows how to access data with the standard DATA step and/or the SQL procedure. However, for certain tasks there is no need to utilize them. Or it is better to say, there are more convenient and more efficient ways. Almost every CRO or pharma has its own set of SAS® macros for retrieving some metadata information from a data set. Like for instance %getvarlist , %getvarlabel, %getvarfmt etc. But what usually stands behind such macros? And when is it the most profitable to use such kind of macros. Is it only restricted to the metadata or can one get access to the data itself the same way? Let’s try to find the answers for these questions. In this paper I would like to discuss how the OPEN function can be used in macro programming, what types of helpful utilities could be created and where they can be utilized the best.

AP-312 : Complex Clinical Trial Coding Challenges: A SAS Coder and a SAS Trainer United Approach
Todd Case, Vertex Pharmaceuticals
Charu Shankar, SAS

SAS coders live in a rich community of other SAS users. A chance meeting of a SAS coder and a SAS trainer in Pharmasug China, 2019 led us to look at some common challenges in the Pharmaceutical Industry and how they can, and should, be looked at from different perspectives - and what better combination than someone from industry and someone who trains programmers in the industry?! In this paper we address 3 challenges using multiple coding techniques: 1. The Subjects Elements (SE) domain - Identifying periods and obtaining first/last dates within a period 2. Looping through all datasets in a library (or multiple libraries) to get any type of information - in this case the total number of distinct patients in all the raw data as well as finding all partial dates, and 3. Where, who and how to ask for help with a complex problem when you are simply stuck.

AP-342 : Speeding Up Your Validation Process is As Easy As 1, 2, 3
Alice Cheng, Independent

After the execution of the COMPARE procedure, a return code value will be generated and stored in the macro variable &SYSINFO. The code indicates the result of comparison of the BASE dataset against the COMPARE dataset. Based on this single value from the code, users can identify which (if any) of the 16 conditions have been violated. In this paper, the author would introduce &SYSINFO. She develops a macro named %QCRESULT that will report all code values of &SYSINFO from the comparison of each dataset, table, listing and figure comparison. As a result, by looking into all &SYSINFO values, users can have an overall view of the current comparison of the entire clinical study report.

AP-369 : Using PROC FCMP to Create Custom Functions in SAS
Keith Shusterman, Reata
Mario Widel, Independent

PROC FCMP is a powerful but somehow underused tool that can be leveraged for creating custom functions. While macros are of vital importance to SAS programming, there are many situations where creating a custom function can be more desirable than using a macro. Particularly, custom functions created through PROC FCMP can be used in other SAS procedures such as PROC SQL, PROC PRINT, or PROC FREQ. As an alternate approach to SAS macros, PROC FCMP can be used to create custom functions that process character and numeric date and datetime values. In this presentation, we will share two custom functions created using PROC FCMP. The first creates ISO 8601 character values from raw date inputs, and the second creates numeric SAS dates and datetimes from ISO 8601 inputs. We will also show how these newly created custom functions can be called in other SAS procedures.

Applications Development

AD-042 : Automation of Statistical Reporting deliverables is impossible today
Andre Couturier, Sanofi
Beilei Xu, Sanofi
Nicolas Dupuis, Sanofi
Rie Ichihashi, Sanofi
Patrice Sevestre, Sanofi

Complete automation of statistical reporting deliverables is impossible today. Too many missing or moving parts and continuous changes in science, standards, computing environments, Health Authority requirements, makes it an unreachable Holy Grail. Even the trials themselves are "adaptive". Any attempt to achieve E2E automation will require large investment, the collaboration of different internal functions (Data Management, Medical Writing, Biostatistics & Statistical Programming, IT) to provide machine readable content, etc. The next best thing is to build an application eco-system that can delivers semi-automated assisted programming based on sponsor augmented metadata that gives programmer control over the system generated programs. This strategy increases quality and efficiency by avoiding multiple re-entry of similar information in multiple places and allowing programmers to concentrate on study specific value added tasks. In this paper we will describe MAP: Sanofi's Metadata Assisted Programming project that aims to support the generation of all submission statistical programming deliverables (e.g. SDTM, ADaM, de-identified datasets, TLF). MAP uses metadata from Protocol, CRF, SAP, TLF, CDISC and Sponsor specific standards in conjunction with user input to build study specific reporting metadata. APIs then delivers this information to our Python code generators to produce editable template programs either in SAS or R providing users with the ability to customize and re-generate them at will. Over time as investments are made to deliver automation enabling applications such as eProtocol and eSAP, the system will require less user input and deliver more automated deliverables such as cSDRG, ADRG, etc.

AD-046 : Normal is Boring, Let’s be Shiny: Managing Projects in Statistical Programming Using the RStudio® Shiny® App
Girish Kankipati, Seattle Genetics Inc
Hao Meng, Seattle Genetics Inc

The ability to deliver projects on schedule, within budget, and aligned with business goals is key to gaining an edge in today’s fast-paced pharmaceutical and biotechnology industry. Project management plays an important role to achieve key milestones with high quality and optimal efficiency. While successful project management is an application of processes, methods, skills, and expertise, tools like SAS®, RStudio®, and PythonTM can help track project status and better position the lead programmer to allocate appropriate resources. One suitable app that fits the specific needs for status tracking in Statistical Programming is R Shiny®, a flexible user-friendly tool that can present an instant study status overview in graphical format with minimal coding and maintenance effort. This paper introduces an innovative design to track study status through an RStudio® Shiny® app that is interactive and reusable and can present status on demand. Based on simple server metadata, we can display a graphical representation of not only the total number of SDTM, ADaM, and TLFs that have been programmed and validated, but also the trend in progress to date to help lead programmers and statisticians determine any resource adjustments based on timely and effective status reporting that refreshes on a monthly, weekly, or daily basis to monitor ongoing study progress. Sample project tracker metadata, the high-level inner workings of our Shiny® app, and Shiny® graphs will be discussed in depth in this paper.

AD-052 : ‘This Is Not the Date We Need. Let’s Backdate’: An Approach to Derive First Disease Progression Date in Solid Tumor Trials
Girish Kankipati, Seattle Genetics Inc
Boxun Zhang, Seattle Genetics

A time-to-event analysis, such as duration of response or progression-free survival, is an important component of assessments in oncology clinical trials. Derivation of true response dates is a key feature in developing ADRS and ADTTE ADaM data sets and a solid understanding of how such dates are derived is therefore critical for statistical programmers. This paper discusses hands-on examples in solid tumor trials based on investigator assessment using RECIST 1.1 and presents SAS® programming techniques that are usually implemented in ADRS. Response date derivations are categorized as below: 1. Overall response date derivation: a. When an overall response is disease progression (PD): when at least one target, non-target, or new lesion shows PD, the date of PD is derived from the earliest of all scan dates that showed PD b. When an overall response is equivocal progression: when at least one non-target or new lesion shows equivocal progression, the date of PD will be the earliest of all scan dates that showed equivocal progression c. When an overall response is other than PD or equivocal progression: the response date is derived from the latest of all radiologic scan dates for the given response assessments 2. Progression backdating: If a new lesion progressed to unequivocal right after equivocal assessments, or a non-target lesion progressed to disease progression right after equivocal progression assessments, backdate the progression event to the earliest scan where the new or non-target lesion was assessed as equivocal progression

AD-055 : A Set of VBA Macros to Compare RTF Files in a Batch
Jeff Xia, Merck
Shunbing Zhao, Merck & Co.

Post database lock changes in a clinical trial are impactful and can result in significant rework. Statistical programmers must access the updated data and regenerate the TLFs for a CSR to maintain data and result traceability. This paper briefly discusses the challenges in comparing two sets of RTF files before and after the post database lock changes, each set might contain tens or hundreds of files, and provides an elegant solution based on VBA technology. Three standalone VBA macros were developed to perform this essential task: 1) Compare: programmatically compare each RTF file in two different versions and record the track change(s); 2) Find_Change: scan every RTF file and see whether there are one or more changes between these two versions and produce a report to show whether a file has been changed or remains the same; 3) Change_details: scan every RTF file and for all RTFs with an update provide a report with all the details in the track changes, i.e., what text was inserted, or deleted, etc. Each VBA macro will be described in this paper and reviewed with examples.

AD-071 : Using SAS ® to Create a Build Combinations Tool to Support Modularity
Stephen Sloan, Accenture

With SAS ® procedures we can use a combination of a manufacturing Bill of Materials and a sales specification document to calculate the total number of configurations of a product that are potentially available for sale. This will allow the organization to increase modularity and reduce cost while focusing on the most profitable configurations with maximum efficiency. Since some options might require or preclude other options, the result is more complex than a straight multiplication of the numbers of available options. Through judicious use of SAS procedures, we can maintain accuracy while reducing the time, space, and complexity involved in the calculations.

AD-075 : Clinical Data Standardization and Integration using Data Warehouse and Data Mart Process
Clio Wu, FMD K&L Inc.

Clinical trial data standardization, integration and analysis have been always a challenge for sponsors and their CRO partners, particularly, sponsors often outsource their whole or part of clinical data programming and analysis projects to multiple CROs, or certain marketed products have new expanded indications or Therapeutic Areas (TAs) that projects can easily span over 20 years with non-CDISC legacy data and CDISC standards data combined, hence increased the complexity of the data standardization and integration process. This paper will introduce how to use the data warehouse and data mart process to develop a centralized system that collects, organizes, pools, and maintains the most recent version of all raw, SDTM, and ADaM datasets. This system will build upon the work done by the individual studies to facilitate the rapid production of standardized and integrated SDTM, ADaM and statistical outputs in an efficient and accurate manner to support ongoing regulatory agencies IB, AR, DSUR/PSUR, new integrated summary of safety (ISS), integrated summary of efficacy (ISE), and internal or external meta-analysis, manuscript, publication requests. The ‘data warehouse’ aspect of this process will collect and organize all current raw, legacy and/or CDISC datasets automatically. The ‘data mart’ aspect of this process will collect, convert and/or create pooled data across multiple studies based on specific data standardization and analysis needs. This paper will detail the project objectives, design strategies, the overall life-cycle and utilization. This paper will also share the challenges and lessons learned from implementing the data warehouse and data mart process.

AD-076 : Let Unix/Linux be on your side
Masaki Mihaila, pfizer

If one has Unix/Linux system setup at the company, why not making the most of it? The terminal window can be a scary thing to look at, but it also can be a tool to run all one’s chores. This paper describes how to set up scripts to get all the information needed on the terminal instead of checking into each location or file. It ultimately saves one’s time in running SAS programs as well as in validation of programming works. This paper will go through how to optimize tasks using the Unix terminal.

AD-084 : Metadata-driven Modular Macro Design for SDTM and ADaM
Ellen Lin, Seattle Genetics, Inc.
Aditya Tella, Seattle Genetics, Inc.
Yeshashwini Chenna, Seattle Genetics, Inc.
Michael Hagendoorn, Seattle Genetics, Inc.

In clinical trial data analyses, macros are often used in a modular or building-block fashion to standardize common data transformations in SDTM or derivations in ADaM to heighten efficiency and quality across studies. A traditional design of such macros is the parameter-driven approach, which provides users with many parameters to control differences on input, output, and data processing from study to study. There are limitations with this kind of design, for example the macro may stop working when unexpected differences arise beyond the scope controlled by parameters, or a lengthy revision and documentation cycle is needed to rework parameter-driven code once more variations are introduced by new studies. Metadata-driven programming is a much more dynamic approach for macro design, especially when targeting areas where differences between studies are less predictable. This design allows key portions of logic and processing to be driven by study-specific metadata maintained outside the macro instead of being exclusively controlled by parameters. It also greatly simplifies user input through parameters and opens the door for a robust and stable macro library at department level. This paper will describe the metadata-driven approach in detail, discuss important considerations on design of such metadata to allow sufficient flexibility, and explain several optimal programming techniques for it. We will also provide real-world data processing examples where a traditional parameter-driven macro would be challenging and the metadata-driven approach fits much better.

AD-088 : Demographic Table and Subgroup Summary Macro %TABLEN
Jeffrey Meyers, Mayo Clinic

Clinical trial publications frequently allocate at least one of the allotted tables to summarize the demographics, stratification factors, and other variables of interest of the patients involved with the study. These tables generally include basic distribution information such as frequencies, means, medians, and ranges while also comparing these distributions across a key factor, such as treatment arm, to determine whether there was any imbalance in patient populations when doing analysis. These distributions are not difficult to compute and combine into a table, but as treatments become more specific to patient characteristics such as genetic biomarkers and tumor stages there is a need to be able to display the distributions in subgroups. The macro %TABLEN is a tool developed to compute distribution statistics for continuous variables, date variables, discrete variables, univariate survival time-to-event variables, and univariate logistic regression variables and combine them into a table for publication. %TABLEN has multiple features for comparisons, subgrouping, and outputting to multiple destinations such as RTF and PDF. The macro %TABLEN is a valuable tool for any programmer summarizing patient level data.

AD-102 : Moving Information to and from aCRF and Seal the eSub Package
Jiannan Hu, Sarepta

eSub package is important for reviewer, and it is mandatory for NDA/sNDA. aCRF is an essential part of an eSub package. It is often an on the fly process during the development of eSub package. Annotating, bookmarking, URL, and page number linking with define.xml is a tedious, time-consuming and error-prone. It is challenging for QCing the package as well. Automating the process will gain efficiency, assure high quality, and free some burden from statistical programmer. In this paper, we explore different technologies, e.g., Javascript, Java, Python and SAS on this task, and the combination of them in the copying annotation from standard aCRF template to CRF, and extracting annotation and page number to Excel/SAS dataset, and updating the Excel file with the bookmark labels, and creating/updating bookmark from a project Excel file. With the standard aCRF template, and centralized bookmark sheets, consistence can be achieved by design. With the extracted SDTM information, this paper also discussed the checking of aCRF, SDTM datasets and control terminology in Define.xml, this will seal the gap of OpenCDISC checking where aCRF is not included.

AD-106 : Macro To Produce Sas®-Readable Table Of Content From Tlf Shells
Igor Goldfarb, Accenture
Ella Zelichonok, Naxion

The goal of this work is to develop a macro that automates a process of reading the shells for tables, listings and figures (TLF) and transforming them into SAS®-readable ordered table of content (TOC). The proposed tool can significantly save time for biostatisticians and lead programmers who have numerously create, revise, update shells documents for TLF. Development of the shells for TLF for any clinical trial is a time-consuming task. Copying titles and footnotes from the shell document (typically MS Word file) into SAS® program or external source of titles and footnotes (e.g., Excel file) is a tedious process requiring scrupulous work subject to human errors. The proposed macro (developed in Excel VBA) automates this process. It reads the shell document (Word) and creates/updates SAS®-readable ordered TOC (Excel) in a matter of seconds. The macro identifies common part of the titles and footnotes for all TLF as well as detects differences for specific outputs. The developed tool analyzes all the requests for repeating outputs, updates their sequential numbers and titles and adds them to the TOC. It also allows to change population if required for repeating tables/figures. Finally, the macro generates an Excel file containing ordered TOC that is immediately ready to be used for the final run of SAS® programs to output planned TLF. Any further updates in the shell document can be incorporated in TOC simply by rerunning this macro.

AD-114 : Chasing Master Data Interoperability: Facilitating Master Data Management Objectives Through CSV Control Tables that Contain Data Rules that Support SAS® and Python Data-Driven Software Design
Troy Hughes, Datmesis Analytics

Control tables are the tabular data structures that contain control data—the data that direct software execution and which can prescribe dynamic software functionality. Control tables offer a preferred alternative to hardcoded conditional logic statements, which require code customization to modify. Thus, control tables can dramatically improve software maintainability and configurability by empowering developers and, in some cases, nontechnical end users to alter software functionality without modifying code. Moreover, when control tables are maintained within canonical data structures such as comma-separated values (CSV) files, they furthermore facilitate master data interoperability by enabling one control table to drive not only SAS software but also non-SAS applications. This text introduces a reusable method that preloads CSV control tables into SAS temporary arrays to facilitate the evaluation of business rules and other data rules within SAS data sets. To demonstrate the interoperability of canonical data structures, including CSV control tables, functionally equivalent Python programs also ingest these control tables. Master data management (MDM) objectives are facilitated because only one instance of the master data—the control table, and single source of the truth—is maintained, yet it can drive limitless processes across varied applications and software languages. Finally, when data rules must be modified, the control data within the control table must be changed only once to effect corresponding changes in all derivative uses of those master data.

AD-130 : A Novel Solution for Converting Case Report Form Data to SDTM using Configurable Transformations
Sara Shoemaker, Fred Hutch / SCHARP
Matthew Martin, Fred Hutch / SCHARP
Robert Kleemann, Fred Hutch / SCHARP
David Costanzo, Fred Hutch / SCHARP
Tobin Stelling, Fred Hutch / SCHARP

Converting case report form (CRF) data to SDTM is a complicated process, even when data are collected in CDASH format. Conversion requires many dataset manipulations and demands flexibility in the order of execution. In addition, different domain types require different actions on the source data, e.g. Findings domains require transposition of data records. Many conversion solutions address this complexity by performing data pre-processing, mapping, and post-processing as disparate pipeline sections. Most include programming in a language such as SAS where blocks of code can obscure the details of the transformation from data customers. This paper describes a solution for SDTM conversion that uses a method termed “Configurable Transformations”. This model both achieves conversion using one consistent pipeline for all phases from CRF data to SDTM and provides visibility into the data transformations for non-programmers. This is achieved by a human readable configuration that uses a small set of simple transformation step types to produce derived datasets. These resulting configurations can be defined by data analysts and can be understood by data customers. Our group was able to successfully map and convert all CRF data for an HIV prevention study using this model with no need for procedural code. This paper will go into the details of the Configurable Transformation model and discuss our use of it in converting data for a study.

AD-142 : Data Library Comparison Macro %COMPARE_ALL
Jeffrey Meyers, Mayo Clinic

Reproducible research and sharing of data with repositories are becoming more standard, and so the freezing of data for specific analyses is more crucial than ever before. Maintaining multiple data freezes requires knowing what changed within the data from one version to another. In SAS there is the COMPARE procedure that allows the user to compare two datasets to see potential new variables, lost variables, and changes in values. Relying on the COMPARE procedure can be tedious and cumbersome when maintaining a database containing several datasets. The COMPARE_ALL macro was written to ease this burden by generating a Microsoft Excel report of a comparison of two data libraries instead of just two datasets. The report indicates any new or lost datasets, variables or observations and checks for changed data values within all variables. Multiple ID variables can be specified and the macro will determine which variables are relevant with each dataset for comparison. The COMPARE_ALL macro is a fantastic tool for managing multiple versions of the same SAS database.

AD-146 : MKADRGM: A Macro to Transform Drug-Level SDTM Data into Traceable, Regimen-Level ADaM Data Sets
Sara McCallum, Harvard T.H. Chan School of Public Health, Center for Biostatistics in AIDS Research (CBAR)

In our support of the NIH funded AIDS Clinical Trials Group (ACTG) and International Maternal Pediatric Adolescent AIDS Clinical Trials (IMPAACT) networks, participants concurrently take multiple medications for the treatment of HIV and other diseases in many of our studies. For analysis and presentation to investigators, our statisticians find an ADaM data set that aggregates all drugs in a regimen at a given time into a single record, where that record represents the period of time those drugs were taken, is needed to summarize regimens, identify regimen changes or treatment gaps, and to aid in explaining other events on study (e.g., occurrence of adverse events or emergence of drug resistance). This paper will present our organization’s SAS macro, MKADRGM. The macro accepts WHO Drug coded CDISC domains recorded at the drug level, and outputs an ADaM data set at the regimen level with start and stop dates. Other variables in the analysis data set include the regimen duration, drug counts, and drug class information. An intermediate data set is used to facilitate traceability to the source SDTM domain(s). This macro was developed for HIV regimens, but also works with other therapeutic areas our organization is involved with, including TB, HCV, and HBV. This paper will highlight some of the inner workings of the macro, and demonstrate the novel usage of the SRC* triplicate (SRCVAR, SRCDOM, and SRCSEQ) to provide traceability even when multiple source SDTM records and domains are mapped to a single regimen in the analysis data set.

AD-169 : Comprehensive Guide: Road Map to Process MedDRA/WHODrug Dictionary Up-Version
Binyang Zhang, FMD K&L
Yejin Huang, FMD K&L

The consistent and universal dictionary version of coded terms in adverse events/concomitant medication/medical history analysis datasets are necessary for facilitating analysis requirements and submissions. Especially for integrated studies, such standardization and update of dictionary version have become a working routine for regulatory submissions. Serving this up-version purpose, each sponsor/CRO would have their own standard or non-standard working processes. Regardless of on which level (individual SDTM vs. integrated ADaM) the up-version practice would be done; it requires thorough planning and joint efforts from study teams. In this research paper, a comprehensive down-stream processing structure would be introduced. Including elaborations on every up-version steps and sample codes/template file, a time-saving and cost-effective standard up-version process could be readily established across programming and team functional groups who are responsible for programmatically auto-coding and manual recoding respectively. Additionally, this process is achievable on both individual SDTM level and integrated ADaM level. Despite last step of merging coded dictionary terms back to data which need extra attentions and improvisations, the whole process introduced are potentially fully automated with macros or utility programs. The process flows of MedDRA and WHODrug dictionary up-version activities are similar, and share mutual high-level steps of procedures. Meanwhile, the details of WHODrug dictionary up-version process are more delicate and requiring more attentions, which would be thoroughly discussed in this research paper. There are five major components of the up-version exercises: constructing coding files for coding team, manual coding, verifications, reconciliation/confirmations from study sponsor, and merging back to coded terms to data.

AD-171 : Clinical Database Metadata Quality Control: Two Approaches using SAS and Python
Craig Chin, Fred Hutch
Lawrence Madziwa, Fred Hutch

Clinical Database Metadata Quality Control: Two Approaches using SAS and Python A well-designed clinical database requires well-defined specifications for Case Report Forms, Field Attributes, and Data Dictionaries. The specifications are passed on to the Electronic Data Capture Programmers, who program the clinical database for data collection. How can a study team ensure that the source specifications are complete, and the resulting clinical database metadata match the source specifications? This paper presents two approaches for comparing metadata between source specifications and resultant clinical database. Initially, SAS was used to read in the specifications and clinical database metadata and to provide comparison checks. Then, a new project in Python was initiated in order to build a more user-friendly tool that allows customers to run the checks themselves. This project made the study database development a more efficient process and improved the quality of our clinical databases.

AD-199 : Efficiently Prepare GMP Compliant Batch Records for Drug Manufacturing and 503A/503B Pharmacy Compounding, with SAS
Robert Macarthur, PharmD, MS, BCSCP, Rockefeller University

In the GMP realm, “If it’s not written down, it didn’t happen”. The batch record (BR) is key in addressing this mandate. It must bring together diverse types of data, processes, and documents, most of which are time-sensitive and staged, in order to fulfill Good Manufacturing Practice (GMP) requirements. There are different types of BRs. Here we include Master Batch Records, Control Batch Records, Master Formulation Records, and Compounding Records. All BRs must be produced in a controlled manner. Any improvements or changes made within or between BRs must be rigorously documented. Most Drug Manufacturers and pharmacies produce BRs in one of two ways: either by piecing them together manually using static forms or relying upon inflexible non-integrated specialized database program/platforms. SAS tools can efficiently produce high quality BRs, that meet GMP requirements. The SAS based BR production process is especially useful for production of phase 1/2 clinical trial supplies, small batch biologicals, and personalized medications produced by 503A/503B pharmacies. The latter typically compound hundreds of different medicines for use in clinics, hospitals, and by patients, many with short turn-around time requirements. Advantages include, both new and existing product BRs are produced rapidly and efficiently, no requirement for investment in new or exotic software, the process can be managed by a SAS programmer working with operations-level staff, compliance with 21CFR11 data archiving is maintained, validation and quality-by-design processes can be applied. We shall describe the SAS-generated BR in detail, including operation, production, validation, and QA/QC.

AD-203 : R4GSK:The Journey to Integrate R Programming into the Clinical Reporting Process
Benjamin Straub, GlaxoSmithKline
Michael Rimler, GlaxoSmithKline

Historically, clinical research and pharmaceutical drug development have relied heavily on the SAS® programming language for database transformations and generation of analysis displays for regulatory submissions. Recently, the industry has witnessed a growing interest in open source languages such as R and Python as an alternative to SAS for many activities related to clinical research. At GSK, we began our journey of integrating R into clinical reporting in early 2019, using R for independent programming of displays developed in SAS. This paper focuses on the design of our integration roadmap such as the decision to begin with validation programming instead of production programming, the value of developing a solid core of reference code, and the internal (and cross-functional) development of clinical reporting tools available to programming staff. The paper will discuss our experiences to date, challenges faced, and the next steps in the journey (including expected future challenges). For example, one challenge was the juxtaposition of the organic upskilling of SAS programmers along a standard career path relative to a desired acceleration upskilling of R programming capability for existing staff. Another challenge was the development of quality reference code that adheres to good programming practice, is readable and well-commented, and prioritizes programming in the tidyverse (an organizationally desired programming practice with R).

AD-208 : Programming Patient Narratives Using Microsoft Word XML files
Yuping Wu, PRA Health Science
Jan Skowronski, Genmal A/S

Patient narratives are an important component of a clinical study report (CSR). The ICH-E3 guidelines require a brief narrative to describe each death, each serious adverse event and other significant adverse events that are judged to be of special interest because of clinical importance. Different from individual tables and listings in a CSR, patient profiles present most, if not all, of the information collected for a subject over the course of a clinical trial. Such documents typically integrate the various sources of data into one cohesive output with each part present in different formats which pose challenging problems for both programmers and medical writers. This paper introduces a new method that uses SAS together with Office Open XML. The narrative Word templates layout is created separately with properly tagged cells. The Word file is then broken up into its XML parts. Read and updated by SAS with the clinical data and finally saved back into its original format which now contains a populated narrative.

AD-211 : Semi-Automated and Modularized Approach to Generate Tables for Clinical Study Reports (CSR)
William Wei, Merck & Co, Inc.

A table, listing and figure (TLF) package is included in a clinical study report (CSR) to summarize the data from a clinical trial with tables making up the majority of the TLF package. Most of the CSR tables summarizing safety and efficacy can be described by two major categories or modules 1) categorical (ex: gender, sex, race, etc.) and 2) continuous (ex: age, time to event, etc.). This paper will discuss the design of the continuous & categorical modules and a macro driven, semi-automated approach to develop these modules in a table. The semi-automated tool is easy to use and requires minimal or no programming to create the most frequently used tables in a CSR. Other TLF tables which are not either continuous or categorical can also be automated and are described in this paper. This approach will significantly reduce the programming efforts and time to produce CSR tables.

AD-296 : SAS Migration from Unix to Windows and Back
Valerie Williams, ICON Clinical Research

SAS® software has been a mainstay for performing analysis and reporting (A&R) in the pharmaceutical industry, for the past four decades. SAS occasionally releases new versions of their software with code enhancements and bug fixes to SAS® functions, procedures, and/or elements. In order to exploit those changes, SAS users may need to upgrade their programming environment and migrate their SAS programs and data to a new platform. Migrating A&R directories across operating systems has proven to be the most challenging, of any migration process, but much can be learned from migrating to a different operating system twice. Resources from SAS, IT, and all business units that use or plan to use SAS must work together to perform the appropriate level of testing and documentation, to ensure that proper directory permissions are in place and that the new environment is fit for purpose, without affecting business unit delivery timelines. This paper will endeavor describe the process and pitfalls of migrating SAS programs and data across SAS versions, SAS platforms (native SAS to SAS GRID), and operating systems, to a mid-level SAS programmer, and provide a few helpful tips to avoid problems, when migrating.

AD-298 : RStats: A R-Shiny application for statistical analysis
Sean Yang, Syneos Health Clinical
Hrideep Antony, Syneos Health USA
Aman Bahl, Syneos Health

This paper will introduce RStats application which is an interactive and dynamic R-Shiny based application that can perform popular statistical analysis models that are frequently used in clinical trials. The application as shown in figure 1, can perform even the advanced analytics such as Logistic regression, Survival analysis, Bootstrap Confidence interval, Anova/Ancova, etc., along with the summary plots for users with no prior R programming or even with limited statistical knowledge. This application also eliminates the need for numerous lines of programming effort to create a similar statistical analysis. As with any software program, there usually is more than one way to do things through R. The RStats application in this paper is one of the ways to perform these statistical analyses. The process flow diagram in Figure 1 explains the steps to perform the statistical analysis using this application: Step1: Import the dataset for the analysis Step2: Select the dependent, independent variables and treatment variable needed for analysis. Step3: Choose the right statistical model Step4: The output and plots are displayed with the option to save the outputs in an external folder location. The advantage of using R-Shiny is the interface that can be easily accessed using the Web. Shiny allows R users to put data insights into the hands of the decision-makers while providing a user-friendly framework that does not require any additional toolsets.

AD-308 : Using Data-Driven Python to Automate and Monitor SAS Jobs
Julie Stofel, Fred Hutchinson Cancer Research Center

This paper describes how to integrate Python and SAS to run, evaluate, and report on multiple SAS programs with a single Python script. It discusses running SAS programs in multiple environments (latin-1 or UTF-8 encoding; command-line or cron submission) and ways to avoid potential Python version issues. A handy SAS-to-Python function guide is provided to help SAS programmers new to Python find the appropriate Python method for a variety of common tasks. Methods to find and search files, run SAS code, read SAS data sets and formats, return program status, and send formatted emails are demonstrated in Step-by-Step instructions. The full Python script is provided in the Appendix.

AD-333 : Integrated P-value summary table for expedited review
Bharath Donthi, Statistics & Data Corporation
Lingjiao Qi, Statistics & Data Corporation

In proof of concept or natural history studies, we may examine numerous endpoints across multiple study populations and may generate several p-values for the purpose of identifying trends and generating hypotheses. The summary tables which contain descriptive statistics for each drug investigated, as well as p-values from inferential statistical tests which are used to evaluate the efficacy and safety of new drugs, may span across thousands of pages, and the reviewer has to spend significant time to identify the statistically significant p-values. This paper presents a practical approach for generating an integrated summary table that displays the resulting p-values originating from varying statistical tests of multiple endpoints in a clinical trial. The p-values are color-coded based on the statistical significance level and directionality of effect (e.g. favoring the active treatment), enabling reviewers to quickly understand the clinical trial data. In addition, each p-value in this integrated summary table can be embedded with hyperlinks routing to the source table, a supporting listing, or any related figures to ease navigation within the statistical report. The review of the tables in their entirety is essential for holistic understanding of the product’s effects but including the integrated p-value summary table will draw attention to endpoints with significant p-values and expedite the review process.

AD-341 : Opening Doors for Automation with Python and REST: A SharePoint Example
Mike Stackhouse

A large part of increasing the efficiency of a process is finding new ways to automate. If manual components of a process can be replaced with automation, those components can then move to the background where they can be validated and trusted to be reliable. Large swaths of data sit within company intranets or other services that may seem ‘disconnected’ – but they may be more connected than one might think. This paper will explore extending automation capabilities with Python and REST APIs, using SharePoint as a primary example. Some topics covered will include using Python to make web requests, the basics of authentication, and how to interact with REST APIs. The paper will demonstrate how data repositories that may seem disconnected can be integrated into automated processes, opening doors for new data pipelines and data sources to pave the way for process improvements, efficiency increases, and better information.

AD-347 : A Tool to Organize SAS Programs, Output and More for a Clinical Study
Yang Gao, Merck & Co., Inc.

Depending on the complexity of a study, there can be many programs and output to manage and review or even rerun due to new data or requirements. Programmers must navigate to different program subfolders to select and run the affected programs and revalidate as required. This paper presents a utility macro which can substantially reduce the resource and time spent on this task. This utility macro captures all SAS programs or output from a specified folder and saves them in a permanent excel file. The resulting excel file provides the path and associated file link. The embedded hyperlinks for these SAS programs and output saves time manually navigating to individual folders and subfolders. In addition, this macro includes functionality to list the output folders, such as rtf and log, with the corresponding SAS programs in the same spreadsheet. Programmers can quickly find the corresponding programs and outputs. The hyperlinks facilitate output review for programmers and statisticians when there is a need for updates. This utility can also generate a SAS program to enable a batch run of the programs without needing to hard code the program names. This utility macro improves programming efficiency by reducing manual effort and decreasing errors during the process. This utility is also useful for supporting multiple studies and reviewing work completed by a partner.

Artificial Intelligence (Machine Learning)

AI-014 : Do you know what your model is doing? How human bias impacts machine learning.
Jim Box, SAS Institute

Machine learning is one of the hottest topics in clinical research, and there is a big push to implement ML in as many areas as possible. To be effectively and ethically using ML, you really nee to understand how the models work and how human bias can creep in and skew results. We'll explore sources of bias and misunderstanding and discuss ways to combat their influence.

AI-025 : Gradient Descent: Using SAS for Gradient Boosting
Karen Walker, Walker Consulting LLC

Much of applied mathematics is centered around evaluation of ERROR in effort to reduce it during computations. This paper will illustrate mathematics equations applied to thousands of calculations. Then show algorithms that are proven to optimize these calculations. It will show ways to reduce the error through a gradient descent algorithm and render it to a SAS program. Next, this paper will use the gradient descent program to find the smallest ERROR when matching two models and subsequently introduce the SAS Viya GRADBOOST procedure. Finally it will prove how more precision is achieved using gradient descent with gradient boosting and show why this is so important to convolutional neural network models.

AI-058 : Pattern Detection for Monitoring Adverse Events in Clinical Trials - Using Real Time, Real World Data
Surabhi Dutta, EG Life Sciences

Patient care involves data capture from disparate sources of care delivery. This includes clinical trials data, sensor data from wearable sensors, hand held devices and Electronic Health Records. We are accustomed to devices that generate health indicator data in large volumes and rapid rate. This paper discusses the benefits, challenges, methods of utilizing this real-time data for pattern detection using machine learning algorithms. This will be done using real world data that has been standardized and integrated during clinical trials. In our previous years paper we had discussed about “Merging Sensor Data in Clinical Trials” ( The paper dealt with standardizing clinical trials data and making it ready to use for analysis. This year we will delve deeper in to using the sensors data for Pattern Recognition and AE Monitoring by classifying and segregating specific group of high risk patients, participating in clinical trials, right from first subject first data point. This kind of pattern detection will also be used in Patient Profiling and predicting risk factors for high risk patients in trials. Challenges with current method of Patient Monitoring in Clinical Trials: Any drug related AE’s are usually documented at the end of the episode. This poses significant time and monitory risks for sponsors for ensuring patient safety, achieving drug efficacy and conducting the trials in a timely manner. This paper would explore Patient Profiling using Machine learning techniques like Clustering and PCA to segregate high risk patients for close monitoring and using predictive analytics for visit based monitoring.

AI-061 : How I became a Machine Learning Engineer from Statistical Programmer
Kevin Lee, Genpact

The most popular buzz word nowadays in the technology world is “Machine Learning (ML).” Most economists and business experts foresee Machine Learning changing every aspect of our lives in the next 10 years through automating and optimizing processes. This is leading many organizations to seek experts who can implement Machine Learning into their businesses. The paper will be written for statistical programmers who want to be a Machine Learning Engineer, explore Machine Learning career or add Machine Learning expertises to their experiences. The paper will discuss about personal journey to become to a Machine Learning Engineer from a statistical programmer. The paper will share my personal experience on what motivated me to start Machine Learning career, how I started it, and what I have learned and done to be a Machine Learning Engineer. And also, the paper will discuss the future of Machine Learning in statistical programming environment.

AI-224 : How to let Machine Learn Clinical Data Review as it can Support Reshaping the Future of Clinical Data Cleaning Process
Mirai Kikawa, Novartis Pharma K.K.
Yuichi Nakajima, Novartis

Technology utilized in pharmaceutical industry has been evolving. There are a lot of innovative new technology such as Artificial Intelligence, Machine Learning, Digitization, Blockchain, Big Data, Open Source Software, etc., which can build a new era of clinical drug development. Manual data review is one of the required processes to ensure clinical data cleanness and readiness for analysis that are essential for patient safety and reliability of the submission documents. Manual data review process involves several roles of people such as Data Manager, Clinical and/or Medical Reviewer, Safety Reviewer, etc. Since it requires complicated logical thinking and clinical and medical knowledge and expertise, it has to be “manual”. That has been the common understanding, and thus the traditional approach. However, does it have to keep being true? In recent years, clinical data collected during clinical trials have been structured and standardized by industrial efforts such as introduction of CDISC and standard operational process by each pharmaceutical company. The structured and standardized data across clinical trials increases compatibility of data utilization, which enables more robust approach for data review. It can be fed into Machine Learning using Python, which is one of the ideas to break the traditional approach and reshape the future of clinical data cleaning. This paper proposes a potential way to let machine learn clinical data review using Python.

AI-242 : Automate your Safety tables using Artificial Intelligence & Machine Learning
Roshan Stanly, Genpro Research Inc.
Ajith Baby Sadasivan, Genpro Research
Limna Salim, Genpro Life Sciences

For FDA submissions, a common reporting strategy for analysis is to create tables which specify the output of each statistical analysis. One such set of tables display the counts and statistics of subject’s safety parameters such as adverse events, laboratory results, vital signs etc. With the advent of big data technologies and high throughput computing resources, large complex data sets can be easily analyzed and safety table generation can be automated. This paper explores the possibilities of developing a dynamic software framework using Angular JS, SAS®, R, Python® and Named Entity Recognition (NER) model for easy and effective analysis of safety data. The input files needed are just the standardized ADaM datasets. The system has standardized templates for most of the safety tables which will vary depending upon the study design (such as single arm, multiple arm, cross over etc). As a first step the tables shells are selected by the user from the various templates offered. Next, we extract the contents like Titles, Headers, Parameters & Sub-Parameters, Statistics, Footnotes etc as Specific entities from the table shell and create a csv file using Python. A map file is also created alongside using Artificial Intelligence and Machine learning which will specify which all variables are to be considered for each parameter, header or descriptive statistics generation from the ADaM datasets. Then there are standards macros written in SAS which will take both the CSV files and ADaM datasets as input and generate the final table in rtf format.

AI-314 : SAS® Viya®: The R Perspective
Ganeshchandra Gupta, Ephicacy Consulting Group

The value of the effective use of data is universally accepted, and analytical analysis methods such as machine learning make it possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results. However, before any such value can be realized, the data must be collected, moved, cleansed, transformed, and stored as efficiently and quickly as possible. SAS Viya not only addresses complex analytical challenges but can also be used to speed up data management processes. SAS Viya is a high-performance, fault-tolerant analytics architecture that can be deployed on both public and private cloud infrastructures. Although SAS Viya can be used by various SAS applications, it also enables you to access analytics methods from SAS, R, Python, Lua and Java, as well as through a REST interface using HTTP or HTTPS. SAS Viya consists of multiple components. The central piece of this ecosystem is SAS Cloud Analytic Services (CAS). CAS is the cloud-based server that all clients communicate with to run analytical methods. The R client is used to drive the CAS component directly using objects and constructs that are familiar to R programmers. In this paper we focus on the perspective of SAS Viya from R.

AI-324 : Emergence of Demand Sensing in Value, Access and Reimbursement Pharma Divisions
Daniel Ulatowski, Teradata

Determining the health economic value and price of a medical asset depends on many standard and non-standard forces that vary across markets. Calculating these forces accurately and across teams will allow for improved reimbursement strategy realization has been a topic of debate in a traditionally under served division. However, market access divisions are making headway in leveraging a wide variety of high-volume, additive calculations from external data sets in order to support a range of business requirements. Given the additive rule structure of health policies, reimbursement forecasts, risk adjustments and market forces, calculations necessary to determine optimal assets for go-to market commercial strategies have a layering effect. Demand sensing and algorithmic forecasting are effective across multiple industries. Supplementing econometric demand forecasting with unconventional data sources can answer a variety of questions that surround product innovation and forecasting. In this paper we discuss the adoption of additional data sets to move pharma into a modern demand sensing-style product evaluation process.

AI-358 : Identifying Rare Diseases from Extremely Imbalanced Data using Machine Learning
Xuan Sun, Ultragenyx Pharamceutical Inc.

Rare diseases are very hard to identify and diagnose compared to other common diseases, since there are not enough data and experts. Most existing machine learning methods are not effective in predicting rare diseases since they produce bias towards majority class. In this paper, we apply imbalance aware machine learning methods based on over-sampling and under-sampling techniques that outperform traditional machine learning approaches and results show that they can identify people with rare diseases with low misclassification rate.

Data Standards

DS-012 : Dynamically Harvesting Study Dates to Construct and QC the Subject Visits (SV) SDTM Domain
Cleopatra DeLeon, PPD, Inc
Laura Bellamy, PPD, Inc

Clinical trials collect vast amounts of dates associated with actual subject encounters. These data are used to provide context to the timing of events and findings of the trial data. Combining these study dates into one FDA-compliant database while appropriately handling data entry errors common to clinical studies can feel like a Herculean effort. It is no wonder why the Subject Visits (SV) domain is so often found non-compliant or left out of the submission completely. This presentation will demonstrate a concise data-driven methodology for leveraging SASHELP views and CALL EXECUTE statements to dynamically harvest dates to develop the SV domain while leveraging logical, sequential and other internal quality checks to handle data entry errors. In addition, this approach to data-driven SDTM domain development can be applied to the related Subject Elements (SE) domain, which is also discussed.

DS-023 : Data Transformation: Best Practices for When to Transform Your Data
Janet Stuelpner, SAS
Olivier Bouchard, SAS
Mira Shapiro, Analytic Designers LLC

When is the best time to create the CDISC standard data? This has been debated for many, many years. Some say that it should be done at the very end of the study before the protocol is submitted. Some say to transform the data at the very beginning of the study as subjects start to enroll. And some do it as needed as the study is enrolling, the data is being cleaned and the shells of the tables, listings and figures are in the process of creation. This is a great forum for experts in the field to give their opinion as to how and when to perform the transformation into CDISC format.

DS-031 : Ensuring Consistency Across CDISC Dataset Programming Processes
Jennifer Fulton, Westat

Whether you work for a small start-up, a mid-level CRO, or Fortune 500 biopharma company, CDISC compliance is a daunting prospect at the start of any project and requires creative solutions and teamwork. Consistency is the hallmark of a CDISC project and was the impetus for the formation of the consortium. Each CDISC project should be approached consistently as well, leading to improved accuracy and productivity, regardless of staff skill and experience. This approach leads to quality final product and FDA approval, ultimate goal. Westat approached this goal by developing an overarching CDISC development and delivery checklist. It provides a visual of the scope of a CDISC project, organizes work instructions and templates for users, and helps assure critical steps are not missed. Like the 26 miles in a marathon, our paper will lay out 26 steps to CDISC compliance, along with tools and techniques we have developed, to help others reach the finish line.

DS-062 : Untangling the Subject Elements Domain
Christine McNichol, Covance

The Subject Elements (SE) domain is unique and challenging in its sources and mapping. Without much encouragement, a map of the sources, derivations and interaction between records in SE can start to look more like a tangled mass of spaghetti than common linear SDTM mapping. Why is this? SE can have multiple sources for the data points needed to derive each element’s start and end. Compared to other domains, SE also has a good deal more direct correlation to values in other domains. For successful implementation, it is critical to understand SE’s purpose and requirements, the unique mapping path from source to SE and how this differs from other common domains, the steps needed to successfully derive SE, and programming methods that can be used. This paper will explore the inner workings of SE and explain how to successfully create SE one manageable bite at a time.

DS-080 : Standardised MedDRA Queries (SMQs): Programmers Approach from Statistical Analysis Plan (SAP) to Analysis Dataset and Reporting
Sumit Pratap Pradhan, Syneos Health

Standardised MedDRA Queries (SMQs) are groupings of MedDRA terms, ordinarily at the Preferred Term level that relate to a defined medical condition. SMQ information is provided by MedDRA in the form of SMQ files (SMQ_LIST, SMQ_CONTENT) and Production SMQ Spreadsheet. One of them should be used to create look-up table/SAS dataset which will be merged with adverse event dataset to derive SMQ information in analysis dataset which will be further used to do reporting. Now the question is how this SMQ information has to be implemented at study level - how many SMQs will be involved? Is Customized Query also involved? Which reports should be generated based on SMQ? Answer to these questions can be found in SAP. In this paper all the topics have been covered step by step with examples that will help even novice programmer to understand and implement SMQ at study level. First basics of SMQ (Narrow and Broad Scope, SQM Category, Algorithm Search, Hierarchical Structure) has been explained. Then detailed guideline has been mentioned how a programmer can create look-up table from SMQ Spreadsheet or SMQ files. Then variables capturing details of SMQ as per Occurrence Data Structure (OCCDS) version 1.0 has been explained. Then scenarios have been explained which will help a user to decide if he needs to use Standardized MedDRA Query or Customized Query or both? At last, example of SAP is shown which will explain how to decode SAP along with SMQ implementation at analysis dataset (ADAE) and reporting level.

DS-082 : Implementation of Immune Response Evaluation Criteria in Solid Tumors (iRECIST) in Efficacy Analysis of Oncology Studies
Weiwei Guo, Merck

In recent years the Immunotherapy have gained attention as being one of the most promising types cancer treatment on the horizon. Immunotherapy, also called biologic therapy, is a type of cancer treatment that boosts the body's natural defenses to fight cancer. While conventional RECIST criteria have served us well in evaluating chemotherapeutic agents, in immuno-oncology, a small percentage of patients manifest a new response pattern termed pseudoprogression, in which, after the initial increase in tumor burden or after the discovery of new lesions, a response or at least a prolonged stabilization of the disease can occur. Tumors respond differently to immunotherapies compared with chemotherapeutic drugs, raising questions about analysis of efficacy. Therefore, a novel set of anti-tumor assessment criteria iRECIST was published to standardize response assessment among immunotherapy clinical trials. In this paper, the difference between the RECIST and iRECIST criteria assessment is described first, then a step by step implementation of iRECIST in efficacy analysis in solid tumors oncology studies using investigator assessment (INV) will be provided starting from the data collection up to the final statistical analysis.

DS-109 : Impact of WHODrug B3/C3 Format on Coding of Concomitant Medications
Lyma Faroz, Seattle Genetics
Jinit Mistry, Seattle Genetics

The WHODrug dictionary is the industry standard for coding concomitant medications. As CDISC becomes more prevalent and strongly recommended by regulatory authorities for submission, the dictionary has evolved over time to ensure full compliance. It is maintained by the Uppsala Monitoring Centre (UMC) with updates provided to industry users twice every year, that is, 1st March and 1st September. The previous WHODrug B2/C formats are now up versioned to B3/C3, which make WHODrug coded data fully compliant with the expectations of regulatory authorities and bring heightened efficiency and other benefits to the industry. The older vs newer format length updates have impact on mapping of coding concomitant medication data according to SDTM CM guidelines. To add to that, per a notice in the Federal Register published by FDA in October 2017, the use of the B3 format is required in submissions of studies starting after 15th March 2019. Hence, it is critical for statistical programmers to learn and be aware of these updates and apply them in new studies. In this paper, we will describe how the WHODrug B3 and C3 formats relate to the U.S. FDA Data Standards Catalog, shed light on aspects relevant to statistical programmers receiving concomitant medication data in these formats, and illustrate efficient ways of handling them in the SDTM CM domain in full compliance with CDISC standards and regulatory submission expectations.

DS-110 : Demystifying SDTM OE, MI, and PR Domains
Lyma Faroz, Seattle Genetics
Sruthi Kola, SVU

CDISC’s SDTM IG is an extensive repository of domain metadata that helps organize clinical trial data into relevant and detailed classifications. With rapid advancements in new drug development, patients now have superior and expansive options for treatment. These innovations in medicine necessitate continual updates to the SDTM IG. However, study implementations may not always keep pace with these updates, thereby not fully utilizing valuable resources available through the IG. This paper highlights three such lesser-known SDTM domains which allow statistical programmers to more efficiently structure study data for downstream analysis and submission. We will also share sample CRFs as part of our case study on these domains: Ophthalmic Examinations (OE) Added in SDTM IG v3.3, this is part of the Findings class. It contains assessments that measure ocular health and visual status to detect abnormalities in the components of the visual system and determine how well the person can see. Microscopic Findings (MI) Also, part of the Findings class, it holds results from the microscopic examination of tissue samples performed on a specimen which is prepared with some type of stain. An example is biomarkers assessed by histopathological examination. Procedures (PR) This is part of the Interventions class. This domain stores details of a subject’s therapeutic and diagnostic procedures such as disease screening (e.g., mammogram), diagnostic tests (e.g., biopsy), imaging techniques (e.g., CT scan), therapeutic procedures (e.g., radiation therapy), surgical procedures (e.g., diagnostic surgery).

DS-117 : CDISC-compliant Implementation of iRECIST and LYRIC for Immunomodulatory Therapy Trials
Kuldeep Sen, Seattle Genetics
Sumida Urval, Seattle Genetics
Yang Wang, Seattle Genetics

The current RECIST and LUGANO criteria are designed to assess efficacy of traditional chemotherapeutic regimens in solid tumor and lymphoma trials, respectively. They are less suitable to assess efficacy of regimens studied in immunotherapy trials, as these may cause tumor flares during treatment which can be associated with clinical and imaging findings suggestive of progressive disease (PD). As a result, without a more flexible interpretation, some patients in such trials might be prematurely removed from a potentially beneficial treatment, leading to underestimation of the true magnitude of the clinical benefit of the agent under investigation. For this reason, iRECIST and LYRIC guidelines were introduced in solid tumor and lymphoma immunotherapy trials, respectively. This paper focuses on the implementation of the additional response criteria introduced by these guidelines, namely “Unconfirmed PD (iUPD)” and “Confirmed PD (iCPD)” by iRECIST and “Indeterminate Response (IR)” by LYRIC. This paper will demonstrate how iUPD, iCPD, and IR data can be collected on the CRF, mapped into SDTM and ADaM, and reported efficiently. We will share our experience with challenges and solutions from an implementation perspective along this entire data flow, with emphasis on CDISC compliance and effective reporting.

DS-133 : Is Your Dataset Analysis-Ready?
Kapila Patel, Syneos Health
Nancy Brucken, Clinical Solutions Group

One of the fundamental principles of ADaM is that analysis datasets should be analysis-ready, which means that each item displayed on an output table, listing or figure can be generated directly from the dataset. The ADaM datasets produced by assuming a 1:1 relationship between SDTM and ADaM datasets may not be analysis-ready, especially if the actual analysis requirements have not been considered. This paper will provide several examples of how an SDTM domain can be split into more than one ADaM dataset to meet analysis needs, and show that the SDTM domain class does not have to drive the class of the resulting ADaM datasets.

DS-195 : Standardizing Patient Reported Outcomes (PROs)
Charumathy Sreeraman, Ephicacy Lifescience Analytics

Any health outcome directly reported by the subject in the trial is referred to as Patient reported Outcome (PROs). It is an addendum to the data reported by the investigator and/or study staff who are conducting the trial. Patient-reported data helps in better understanding of the subject’s perspective. In addition to providing physiological effects, it is critical in evaluating the safety and efficacy of a drug administered. The patient-reported data is typically collected through subject diary. Subject diary, often called patient diary, is a tool used during a clinical trial or a disease treatment to assess the patient's condition or to measure treatment compliance. The use of digitized patient-reported data, or patient-reported data, is on the rise in today's health research setting. Subject diary can collect the information about: Daily symptoms, daily activities, safety assessment, usage of the study medication to measure the compliance, usage of the concomitant medication and disease episodes on frequent basis. In this presentation, we will be exploring the standardisation of the diary data with standard SDTM domains in different therapeutic areas.

DS-196 : Simplifying PGx SDTM Domains for Molecular biology of Disease data (MBIO).
Sowmya Srinivasa Mukundan, Ephicacy
Charumathy Sreeraman, Ephicacy Lifescience Analytics

Pharmacogenomics/genetics peruses how the genetic makeup of an individual affects his/her response to drugs. It deals with the influence of acquired and inherited genetic variation on drug response in patients by correlating genetic expression with pharmacokinetics (drug absorption, distribution, metabolism and elimination) and pharmacodynamics (effects mediated through a drug's biological targets). The purpose of the SDTMIG-PGx is to provide guidance on the implementation of the SDTM for biospecimen and genetics-related data. The domains presented in the SDTMIG-PGx are intended to hold data that fall into one of three general categories: data about biospecimens, data about genetic observations, and data that define a genetic biomarker or assign it to a subject. The paper will throw some light on the mapping challenges encountered in MBIO data with sample CRF pages illustration.

DS-226 : From the root to branches, how Standards Teams utilize Decision Trees to accept or reject Standard Change Requests.
Donna Sattler, BMS
Sharon Hartpence, BMS

Oh No! You’re a Data Manager and your study team is requesting data collection that is not in your company’s standards library. How many hoops do you need to jump through in order to get what your study team needs? Do you have a handy set of questions to ask yourself in order to do some pre-work yourself, before you torture your standards team? How do they know what you really need? Do you have a CRF to start from? Do you have some recommended labels? Be prepared with a gap analysis and even a protocol rationalization to help them out. Have you considered what other teams need to know about the new collection? Careful change control requests preparation, including the use of a decision tree, can speed up the triage and approval process and have your study team building their database in no time!

DS-235 : Tackle Oncology Dose Intensity Analysis from EDC to ADaM
Song Liu, BeiGene
Cindy Song, BeiGene Cororperation
Mijun Hu, BeiGene Corporation
Jieli Fang, BeiGene Cororperation

Exposure analysis in oncology can be complicated when dose is adjusted based on body weight changes from baseline, or the treatment is administered in 2-5 consecutive days on different frequency rather than with one infusion within one treatment cycle. The paper is to demonstrate how we mapped exposure data collected on EDC to SDTM.EX and how we build ADaM exposure analysis data. Dose intensity involves some critical parameters: the planned dose intensity (mg/m2/cycle) which is planned dose per cycle as defined in the protocol; and the actual dose intensity (ADI), defined as the total cumulative dose taken divided by the treatment duration. Without BSA being derived in EDC, dose intensity derivation becomes complicated to standardize dose taken within one cycle with EXDOSE (mg/m2/cycle) for treatment that goes on several days then off for the remaining 21-day cycle. It becomes more challenging to check dose adjustment when weight changed >=10% from the baseline or previous weight used as the new baseline without BSA in EDC. We requested BSA from the IRT system and merged that with the exposure data by visit to solve the problem. Since chemotherapy includes different treatments in this study, the treatment duration, total actual dose taken, total treatment cycle are all unique and critical to derive dose intensity. For table programming efficiency, instead of using PARCAT for each of the treatment, we created five ADEXmed, where med represents each treatment name by keeping the ADaM structure and variable names the same for each treatment data.

DS-248 : Best practices for annotated CRFs
Amy Garrett, Pinnacle 21

There is no doubt that the SDTM annotated CRF (aCRF) is one of the most cumbersome submission documents to create. Once a purely manual task, the extreme burden required to create the aCRF has led to several novel methods to automate or partially automate the process. As the industry moves away from manually annotating CRFs and towards automation, it’s more important than ever to truly understand the properties of a high-quality aCRF. This paper reviews published guidance from regulatory agencies and provides best practices for CRF annotations. Following these best practices will ensure your aCRF fulfills current regulatory requirements, as well as meets the needs of internal users and programs.

DS-256 : A codelist’s journey from the CDISC Library to a study through Python
Mike Molter, PRA Health Sciences

The publication of the CDISC Library should be every programmer’s dream. The use of PDF-based Implementation Guidelines or even Excel files downloaded from the CDISC website always produced manual, non-automated hiccups to the process of standards implementation. The Library gets us one step closer to automation nirvana. In this paper we’ll present a small-scale proof-of-concept, in which a study team member selects a subset of terms from a standard codelist that are appropriate for a specific study. We’ll see how just a few lines of Python code can extract from the Library; how a few more can send contents of a codelist to an HTML form, and on the backend; and how a few more can process the choices a user made with their browser – all without Excel operations such as copying and pasting. The purpose of this exercise is not to demonstrate a fully functioning production web application, but rather, to give the reader a sense of what is possible. Knowledge of basic Python objects such as lists and dictionaries is helpful, but not essential.

DS-261 : SUPPQUAL datasets: good bad and ugly
Sergiy Sirichenko, Pinnacle 21

SUPPQUAL datasets were designed to represent non-standard variables in SDTM tabulation data. There are many recent discussions about whether the SDTM Model should allow the addition of non-standard variables directly to General Observation Class domains instead of using SUPPQUAL datasets? However; there is still a lack of implementation metrics across the industry to understand actual utilization of SUPPQUAL datasets. In this presentation we will summarize metrics from many studies and different sponsors to produce an overall picture of utilization of SUPPQUAL datasets by the industry. We will analyze commonly used SUPPQUAL information for being potentially promoted to standard SDTM variables. Also; we will provide and discuss examples of correct and incorrect utilization of SUPPQUAL datasets in submission data to understand if the industry is ready to switch from SUPPQUAL datasets to non-standard variables?

DS-285 : Know the Ropes - Standard for Exchange of Non-clinical Data (SEND)
Pravinkumar Doss, Agati Clinical Informatics LLP

The CDISC SEND model is an implementation of the Study Data Tabulation Model (SDTM) for the submission of nonclinical studies. The SEND Model was first developed in July 2002, utilizing domains described in the CDER 1999 Guidance and the SDTM as its basis. The feedback from first pilot with the FDA in 2003 and continuous efforts made to align SEND with SDTM which led to development of SEND 2.3. By 2007, an FDA pilot was announced, this is the real stimulus that gained the model widespread. Since that time, more than forty volunteers representing pharmaceutical companies, vendors, and CROs have created close to thirty domains. All studies started after December 15, 2016 supporting IND and BLA submissions will need to be compliant with SEND. The Pharmaceuticals and Medical Devices Agency in Japan will enforce its use in the future, most probably in 2020. The European Medicines Agency also expressed interest and is recommending the use of SEND. This paper will provide an overview and the history of SEND. Understanding the representation of Test results, examinations, and observations for subjects of a nonclinical study in a series of SEND domains. Secondly, The Process of mapping and converting the nonclinical study data to SEND structure by following SENDIG 3.1 and its validation process. Lastly, about the compliance checks and conformance to the CDISC Model and creation of Define XML – the Roadmap to create overall content for agency submission purposes.

DS-295 : Get Ready! Personalized Medicine is here and so is the data.
Bhargav Koduru, Sarepta Therapeutics Inc.
Andi Dhroso, Sarepta Therapeutics Inc.
Tanya Teslovich, Sarepta Theraputics Inc.

Pharmacogenomics (PGx) is the study of how genes affect an individual’s response to drugs. As of January 2020, there are approximately 130 PGx clinical trials registered in, at various stages of enrollment and/or development, mostly focused on improving drug safety by drug optimization, leveraging genetic stratification factors such as single nucleotide polymorphisms (SNPs). Next generation sequencing (NGS) is a high-throughput technology widely used to sequence the DNA and RNA of study participants. In 2005 for the first time, FDA released a “Guidance for Industry: Pharmacogenomics Data Submissions,” which called for voluntary genomic data submission. Since then only a few new documents highlighting PGx in general have surfaced. However, as the amount of available genomic data has increased as the technology has become cheaper and more accessible, there is a need for clear recommendations on streamlining the use of genetic data. This led the FDA to issue various regulatory guidelines on NGS, including the latest Guidance on “Submitting Next Generation Sequencing Data to the Division of Antiviral Products,” released in July 2019. CDISC published the clinical data standard for genomics and biomarker data, SDTMIG-PGx, in May 2015, to support the increasing availability of genomic data in clinical trials, but the complexity of mapping genomic data still intimidates many clinical programmers. This paper will introduce NGS and how to map data, leveraging the recent guidelines from both FDA and CDISC. This paper also demonstrates that a close collaboration with cross-functional experts, such as bioinformaticians, is key to best utilize genomic data.

DS-306 : SDTM to ADaM Programming: Take the Leap!
Alyssa Wittle, Covance, Inc.
Jennifer McGrogan, Covance, Inc.

Does ADaM programming ever feel intimidating or outside of your expertise? Then this presentation is tailored just for you. Why is it time to take the leap into learning more about ADaM or even to dive into ADaM programming? The more you understand about the path clinical data goes through to arrive at the analysis, the better a programmer you will be! This presentation will highlight why it is important and exciting to learn about ADaM theory and ADaM programming. Where do you even start when learning about this topic? We will give some recommendations for effective ways to learn about ADaM and ADaM programming without getting overwhelmed in all the very technical details up front. We will give examples of common pitfalls and solutions often encountered when just starting out in ADaM. Finally, to assist in your success in this learning experience, we will give some tips and tricks to help you along the way.

DS-318 : Blood, Bacteria, and Biomarkers: Understanding the scientific background behind human clinical lab tests to aid in compliant submissions
Maddy Wilks, Precision for Medicine, Oncology and Rare Disease
Sara Warren, Precision for Medicine

Often statistical programmers play the role of bridging the gap between scientific analysis and the pragmatic application of CDISC standards. This requires a wide scope of understanding of many parts of the clinical trial process. With the rise of precision medicine, laboratory tests can be niche. If laboratory tests come from external laboratories, they can also be challenging to interpret. If a spreadsheet column simply titled “Lactobilli” contains “Yes” or “No” responses with no other context, deciding where to map this data compliantly can be a challenge. If it becomes necessary through the duration of a study to focus analysis on a lipid panel, but a specific lipids CRF page does not exist, what laboratory tests are important to include, and what characteristics of the method and content of collection might render them not evaluable? If a sponsor collects biomarker data, is the Pharmacogenomics SDTM dataset always the best place for it? Understanding the unique characteristics of the source data and the standardization intent of compliance datasets is necessary to create an accurate submission. Sometimes SAPs and Protocols simply do not have enough information to guide an informed compliance and mapping decision. A high-level foundational knowledge base of human clinical laboratory tests can be an invaluable tool. This paper aims to equip programmers with the tools to make informed compliance decisions in their own workflow by using biologic information, TAUGs, and other regulatory resources that serve the scientific aims of the sponsor.

DS-323 : Considerations in Implementing the CDASH Model v1.1 and the CDASH Implementation Guide v2.1
Jerry Salyers, TalentMine

In September of 2017, the CDASH team published their first Model and Implementation Guide. This was part of a continuing effort to better align with SDTM, which, of course, has published both an overarching Model along with an accompanying Implementation Guide for many years. Similar to SDTM, the first CDASH Model provided a general framework for creating fields to collect information on CRFs and includes the model metadata. The Model provides root naming conventions for CDASHIG variables that are intended to facilitate mapping into SDTMIG variables. In conjunction with this first Model, the team published the CDASH Implementation Guide v2.0 along with the associated CDASHIG Metadata Table. As the CDASHIG states, “The informative content of the CDASHIG and the normative content metadata table comprise the CDASHIG and must be referenced together.” In November of 2019, the CDASH team published the newest versions of the CDASH Model (v1.1) and the CDASH Implementation Guide (v2.1). Again, this update allows for CDASH to make further progress in “catching up” to the SDTM and the SDTM Implementation Guide. Both versions to date of the CDASH Model and Implementation Guide are aligned with SDTM v1.4 and SDTMIG v3.2. This paper will focus on highlighting the progress that CDASH continues to make to more fully align with SDTMIG v3.2 and to pave the way for further development in the CDASH Foundational Standard.

DS-329 : Overcoming Pitfalls of DS: Shackling 'the Elephant in the Room'
Soumya Rajesh, SimulStat
Michael Wise, Syneos Health

Disposition (DS) is a standard SDTM domain that has been around since the inception of SDTM. Although familiar, it has often been misinterpreted or misused. Unlike other SDTM domains, direct mapping from CRF pages presents challenges within DS. For example, CRF values may not be a perfect fit for the terms defined in controlled terminology code-lists, especially as seen in 'End of Study' or 'End of Treatment' pages. When a code-list does not have an exact match with the CRF text, you may need to request NCI to extend the code-list or add new terms. This, however, may create problems because not every “new term” should extend a code-list. Also, it’s important to understand the differences between criteria so that DSCAT / DSDECOD values are assigned appropriately. From an annotation perspective, this means that if the values in a variable are 'assigned', it should not be annotated on the CRF. This paper will guide you through the mapping of CRF pages to DS, and illustrate how to choose appropriate control terminology for variables like DSCAT, DSDECOD and EPOCH. So you should hopefully be able to overcome the pitfalls of DS and shackle that ‘elephant in the room'.

DS-339 : Extracting Metadata from the CDISC Library using SAS with PROC LUA
Lex Jansen, SAS Institute Inc.

The CDISC Library is the single, trusted, authoritative source of CDISC standards metadata. It uses linked data and a REST API to deliver CDISC standards metadata in a machine-readable format to software applications that automate standards-based processes. This paper shows how metadata can be extracted from the CDISC Library using a REST API request in SAS. PROC LUA will be used to manage the PROC HTTP requests and to parse JavaScript Object Notation (JSON) response strings to extract data before storing them in a SAS data sets.

DS-344 : Trial Sets in Human Clinical Trials
Fred Wood, Data Standards Consulting Group

The Trial Sets table has been included in the SDTM since Version 1.3, published in 2012. The only implementation guide in which it appears, however, is the SEND Implementation Guide (SENDIG). The Trial Sets dataset (TX) allows for the subsetting of subjects within an Arm (treatment path) and facilitates the “grouping” multiple Arms together. A Trial Set represents the most granular subdivision of all the experimental factors, treatment factors, inherent characteristics, and distinct sponsor designations as specified in the design of the study. Within a nonclinical trial, each animal is assigned to a Set in addition to an Arm. The Set Code (SETCD) variable is Required in the SEND DM dataset. While there is no such requirement in the SDTMIG DM dataset, Trial Sets has potential uses in human clinical trials, particularly when the randomization or the study design is based on factors other than treatment (e.g., subjects who have undergone previous heart surgery vs. those who have not). This presentation will provide an introduction to Trial Sets as it’s used in nonclinical studies as well as examples of how this dataset could be used in human clinical trials.

DS-351 : Have You Heard of the TD Domain? It Might Not be What You Think it is.
Lynn Mullins, PPD
Ajay Gupta, PPD Inc

The Study Data Tabulation Model (SDTM) has been the standard for collecting and organizing clinical research data for many years. The Trial Design Domains are included in these standards to store information regarding the design of the clinical trial. However, there is a little-known trial design domain called Trial Disease Assessment (TD). This paper aims to describe the purpose, contents, and show examples of the TD domain.

DS-365 : Creating SDTMs and ADaMs CodeList Lookup Tables
Sunil Gupta, TalentMine

Do you need to review and confirm codelist values for variables in SDTMs and ADaMs? Codelists are list of unique values for key variables such as LBTEST, AVISIT and AVISITN. Codelists need to be cross-referenced with control terms. Codelist dictionary compliance checks are very important but are often neglected. Since all raw data is now standardized to control terms, there are many opportunities for cross checking SDTM, ADaMs with codelist dictionary tables. In addition, the define xml file must have a correct and updated codelist section. This presentation shows how to automatically create a codelist dictionary across all ADaMs and SDTMs as well as compare codelist dictionaries from SDTMs and define xml specifications for example. Both examples are essential to meet CDISC compliance. Note that without an automated process to create codelist dictionaries, the alternative method of applying Proc FREQ on all categorical variables is very time consuming.

DS-368 : Good versus Better SDTM – Some Annoying Standard Dictionary Issues
Mario Widel, Independent
Henry Winsor, Relypsa Inc

FDA expects that medical event and medication data will be provided using standard coding dictionaries, MedDRA for the medical events and WHODrug for the medications. The authors have noted some interesting misconceptions about each directory while working on clinical data over the years and would like to bring them to your attention. In particular, too many people think that the intermediate MedDRA terms (HLT, HLGT) are unnecessary and can be omitted, and that the generic name for a medication is always determined by the drug record number, with fixed values for the two sequence numbers. The authors discuss some of the historical reasons for these two and other misconceptions, show that they are no longer true and illustrate a time-saving benefit of the new WHODrug B3 dictionary design.

Data Visualization and Reporting

DV-004 : Library Datasets Summary Macro %DATA_SPECS
Jeffrey Meyers, Mayo Clinic

The field of clinical research often involves sharing data with other research groups and receiving data from other research groups. This creates the need to have a quick and concise way to summarize incoming or outgoing data that allows the user to get a grasp of the number of datasets, number of variables, and number of observations included in the library as well as the specifics of each variable within each dataset. The CONTENTS procedure can fulfill this role to an extent, but the DATA_SPECS macro uses the REPORT procedure along with the Excel Output Delivery System (ODS) destination to create a report that is fine tuned to summarize a library. The macro produces a one page overview of the datasets included in the specified library, and then creates a new worksheet for each dataset that lists all of the variables within that dataset along with their labels, formats, and a short distribution summary that varies depending on variable type. This gives the user an overview of the data that can be used in documents such as data dictionaries, and demonstrates an example of the powerful reports that can be generated with the ODS Excel destination.

DV-006 : What's Your Favorite Color? Controlling the Appearance of a Graph
Richann Watson, DataRich Consulting

The appearance of a graph produced by the Graph Template Language (GTL) is controlled by Output Delivery System (ODS) style elements. These elements include fonts, line and marker properties as well as colors. A number of procedures, including the Statistical Graphics (SG) procedures, produce graphics using a specific ODS style template. This paper provides a very basic background of the different style templates and the elements associated with the style templates. However, sometimes the default style associated with a particular destination does not produce the desired appearance. Instead of using the default style, you can control which style is used by indicating the desired style on the ODS destination statement. However, even one of the 50 plus styles provided by SAS® still does not achieve the desired look. Luckily, you can modify an ODS style template to meet your own needs. One such style modification is to control what colors are used in the graph. Different approaches to modifying a style template to specify colors used will be discussed in depth below.

DV-009 : Great Time to Learn GTL: A Step-by-Step Approach to Creating the Impossible
Richann Watson, DataRich Consulting

Output Delivery System (ODS) graphics, produced by SAS® procedures, are the backbone of the Graph Template Language (GTL). Procedures such as the Statistical Graphics (SG) procedures dynamically generate GTL templates based on the plot requests made through the procedure syntax. For this paper, these templates will be referenced as procedure-driven templates. GTL generates graphs using a template definition that provides extensive control over output formats and appearance. Would you like to learn how to build your own template and make customized graphs and how to create that one highly desired, unique graph that at first glance seems impossible? Then it’s a Great Time to Learn GTL! This paper guides you through the GTL fundamentals while walking you through creating a graph that at first glance appears too complex but is truly simple once you understand how to build your own template.

DV-029 : A Templated System for Interactive Data Visualizations
Sean Lacey, Enanta Pharmaceuticals

With the development of tools like Shiny for R and Dash for Python, industry demand for interactivity in deliverables is on the rise. This paper proposes a templated solution for interactive data visualizations that combines the data processing power of SAS® with the customization and interactivity of an HTML file utilizing the JavaScript D3 library. Through the templating of an HTML file and utilization of PROC STREAM and PROC JSON, SAS® combines the various puzzle pieces into a single, interactive HTML file. An interactive Adverse Event swimmer’s plot provides an example of one possible implementation.

DV-050 : Oncology Graphs-Creation (Using SAS and R), Interpretation and QA
Taniya Muliyil, Bristol Meyers Squibb

Data visualization plays a key role in analyzing and interpreting data. Oncology graphs help to visualize, interpret and analyze trends in data from a statistical perspective. Graphical outputs help in exploring data ,identifying issues with data and in turn help to improve data quality. Most commonly used statistical software’s for creating oncology graphs in pharmaceutical industry is SAS and R. Programmers create complex graphical outputs using these statistical tools but many are unable to interpret the results that these graphs display. Ability to interpret results helps to identify issues with the graphs or the data used to generate these outputs. This in turn also helps to verify these graphical outputs. This paper focuses on creating some common oncology graphs like spider plot, swimmer plot and waterfall plot using R and SAS, along with interpreting the results displayed by these graphs. It also discusses common QA findings that will reduce issues while generating these outputs and in turn help with statistical interpretation and analysis.

DV-057 : R for Clinical Reporting, Yes – Let's Explore It!
Hao Meng, Seattle Genetics Inc.
Yating Gu, Seattle Genetics, Inc.
Yeshashwini Chenna, Seattle Genetics

RStudio® is an interpreted programming language-based software application which can be an ideal platform for statistical analysis and data visualization. For biostatisticians and programmers in the pharmaceutical and biotech industry, it offers a wide and rapidly growing range of user-developed packages containing functions which can efficiently manipulate complex data sets and create tables, figures, and listings. While SAS® remains a critical tool in these industries, the popularity of RStudio® in both academia and the clinical industry increased exponentially over the last decade due to its free availability, easy access, flexibility, and efficiency. As RStudio® is gaining popularity and given that regulatory agencies have not endorsed any particular software for clinical trial analysis and submission, understanding the competency of RStudio® and being well-positioned to use it in a clinical data reporting environment is a worthy endeavor. This paper introduces a pragmatic application of RStudio® by describing an actual use case of clinical trial data manipulation and export (e.g., SDTM and ADaM to XPORT format), creation of tables, figures, and listings, and a simulation use case. We will share the RStudio® packages used as well as pros and cons between RStudio® and SAS®. This paper also provides a brief background to the RStudio® platform and our programming environment set-up, along with relevant statistical programming details.

DV-066 : Simplifying the Derivation of Best Overall Response per RECIST 1.1 and iRECIST in Solid Tumor Clinical Studies
Xiangchen (Bob) Cui, Alkermes, Inc
Sri Pavan Vemuri, Alkermes

The objective tumor response rates (ORR) is one of endpoints in solid tumor clinical studies per FDA guideline [1]. It plays the critical role in earlier phase oncology studies. It is based on the best overall response (BOR), which is defined as the best response across all time-point responses. Response Evaluation Criteria in Solid Tumors version 1.1 (RECIST 1.1) [2] has been quickly adopted since 2009, and iRECIST [3] has been recommended for use in trials testing immune therapeutics by the RECIST working group since March 2017. These two guidelines are applied to derive the overall response at each time point (both on-treatment and follow-up). For non-randomized trials, both RECIST 1.1 and iRECIST require the confirmation of complete response (CR) and partial response (PR). Moreover, progressive disease (iUPD) needs the confirmation for iRECIST. These confirmations provide challenges in statistical programming during the derivation of BOR. This paper presents a new approach to overcome these challenges by illustrating the logic and data flow for the derivation of BOR. The new technique simplifies the process and ensures the technical accuracy and quality. Furthermore, the traceability for the derivation of the date of Best Overall Response (BOR) and its result is also built in this “simple” process. The sharing of hands-on experiences in this paper is intended to assist readers to apply this methodology to prepare an ADaM dataset for the reporting of ORR to further support clinical development of cancer drugs and biologics.

DV-132 : Automation of Flowchart using SAS
Xingshu Zhu, Merck
Bo Zheng, Merck

Flowchart diagrams are commonly used in clinical trials because they provide a direct overview of the process, including participant screening, recruited patient enrollment status, demographic information, lab test results, etcetera. Flowcharts are particularly useful when working on Pharmacoepidemiology studies, which require dynamic study designs due to the unpredictability and variation in data sources. In this paper, we introduce a simple SAS macro that allows to create customized flowchart diagrams that fit a customer’s individual needs by selecting two different methods and applying them to various types of source data.

DV-135 : Making Customized ICH Listings with ODS RTF
Huei-Ling Chen, Merck
William Wei, Merck & Co, Inc.

ICH (International Consortium on Harmonization) data listings are common reports prepared by pharmaceutical companies in regulatory submission. Such listings are often submitted in RTF format. This paper presents a technique that can be used to efficiently produce a customized ICH Abnormal Lab Listing based on a company’s uniform RTF output standards. This technique includes five components: data preparation; and four SAS® syntaxes (SYSTEM OPTIONS, ODS TAGSETS.RTF, PROC TEMPLATE style, and PROC REPORT) can be utilized to define the layout and to render a data listing table. This paper will first describe the problem that we are trying to solve and then give details on the ODS TAGSETS.RTF options that we have chosen to solve this problem with mock data.

DV-148 : Butterfly Plot for Comparing Two Treatment Responses
Raghava Pamulapati, Merck

Butterfly plot is an effective graphical representation for comparing two treatment responses for the same subject across different time points. Butterfly plots are drawn across a centered axis to separate two treatment responses in one picture. Butterfly plots accommodate a subject’s two treatment responses in one plot, such as, current treatment response with prior study/non-study treatment response or active treatment response with control treatment response or mono therapy response with combination therapy response. Butterfly plots can be created using SAS® PROC SGPLOT. The inclusion of the HBAR statement creates horizontal bars on the Y-axis. Each bar represents one subject. The RESPONSE option displays duration of study medications on the X-axis. The FILLATTRS option categorizes the response in colors by using variables that are derived for each treatment and response type with corresponding description displayed at plot legend using KEYLEGENT statement. More specific details on the PROC SGPLOT syntax and plot options will be presented in the body of the paper. Furthermore, steps to derive required variables and dataset pre-processing to categorize response will be discussed.

DV-157 : Automating of Two Key Components in Analysis Data Reviewer’s Guide
Shunbing Zhao, Merck & Co.
Jeff Xia, Merck

The Analysis Data Reviewer’s Guide (ADRG) provides reviewers with additional context for analysis datasets received as part of a regulatory submission. It is crucial to submit a clear, concise and precise ADRG. The ADRG consists of seven sections (1. Introduction, 2. Protocol Description, 3. Analysis Considerations Related to Multiple Analysis Datasets, 4. Analysis Data Creation and Processing Issues, 5. Analysis Dataset Descriptions, 6. Data Conformance Summary, 7. Submission of Programs) and optional appendices. In sections 4 and 5, two key components are highly recommended to be included by FDA reviewers: a graph showing data dependencies and a table that identifies efficacy and primary outcome datasets. This paper presents three useful SAS macros to automatically create two key important components for ADRG: the first macro generates the data dependency graph by using SAS Graph Template Language (GTL); The second macro generates the table of “Analysis Dataset Descriptions” in ADRG; Additionally, the third macro automatically inserts the generated data dependency graph and the table of “Analysis Dataset Descriptions” into the right place in ADRG. This innovative approach removes a few trivial steps from the ADRG generation process, which had to be done manually previously. It helps to create a clear, concise and precise ADRG.

DV-158 : Enhanced Visualization of Clinical Pharmacokinetics Analysis by SAS GTL
Min Xia, PPD

Graphs are essential tools to comprehensively represent data and raise the readability of analysis. They are an integral part of Pharmacokinetic (PK) analysis and the high-quality PK graphs are very important for Clinical Study Report (CSR) and a key task for regulatory submissions. This paper provides instructions to create PK analysis graphs powered by SAS Graph Template Language (GTL) such as multiple panels series plot and multiple columns forest plot. In addition, DYNAMIC variables and other advanced techniques are introduced to increase the flexibility of GTL and the data visualization accuracy.

DV-163 : plots and story with diabetes data
Yida Bao, Auburn University
Zheran Rachel Wang, Auburn University
Jingping Guo, Conglomerate company
Philippe Gaillard, Auburn University

The graph has always been more intuitive than icy statistics. SAS gives us quite powerful graph capabilities. In this project, we use diabetes datasets to do data visualization research. The detection of diabetes can generally be judged by several indicators- Glucose, Insulin, BMI and so on. In general, our research contains three parts. First we apply SAS ® procedure PROC Gplot to create line diagrams, which helps us to explore the inner structure between different factors. We will introduce two different methods to generate overlay plot based on the binary detection result. Also, we will apply SAS Enterprise Miner to proceed with the principal component analysis, which helps us to bright the project. Later, we’ll use the typical discrimination method, convert the dataset into several canonical variables and generate the proper plot to express the result. At last, we’ll summarize the result to establish information hierarchy, and tell the other researcher our understanding of the diabetes dataset.

DV-164 : Using R Markdown to Generate Clinical Trials Summary Reports
Radhika Etikala, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch
Xuehan Zhang, Statistical Center for HIV/AIDS Research and Prevention (SCHARP) at Fred Hutch

The scope of the paper is to show how to produce a statistical summary report along with explanatory text using R Markdown in RStudio. Programmers write a lot of reports, describing the results of data analyses. There should be a clear and automatic path from data and code to the final report. R Markdown is ideal for this, since it is a system for combining code and text into a single document. I’ve found that R Markdown is an efficient, user-friendly tool for producing reports that do not need constant updating. RStudio is often used in the Pharmaceutical industry and health care for analysis and data visualization, but it can also be successfully used for creating reports and datasets for submission to regulatory agencies. This paper presents an RStudio program that demonstrates how to use R Markdown to generate a statistical table showing adverse events (AE) by system organ class (or preferred term) and severity grade along with text that explains the table. Collecting AE data and performing analysis of AEs is a common and critical part of clinical trial world. A well-developed reporting system like the one generated with R Markdown, provides a solid foundation and an efficient approach towards a better understanding of what the data represent.

DV-166 : An Introduction to the ODS Destination for Word
David Kelley, SAS

The SAS® Output Delivery System (ODS) destination for Word enables customers to deliver SAS® reports as native Microsoft Word documents. The ODS WORD statement generates reports in the Office Open XML Document (.docx) format, which has been standard in Microsoft Word since 2007. The .docx format uses ZIP compression, which makes for a smaller storage footprint and speedier downloading. ODS WORD is preproduction in the sixth maintenance release of SAS 9.4. This paper shows you how to make a SAS report with ODS WORD. You learn how to create your report's content (images, tables, and text). You also learn how to customize aspects of your report's presentation (theme and styles). And you learn how to enhance your report with reader-friendly features such as a table of contents and custom page numbering. If you're cutting and pasting your SAS output into Microsoft Word documents, then this paper is especially for you!

DV-191 : Early phase data visualization: how to get ahead in the game
Jing Pan, Translational Clinical Oncology, Novartis Institutes for BioMedical Research, Inc.

Unlike phase III studies, early phase studies (Phase I /II) and especially FIH studies, there is much less time to prepare data for decision-making. Decision making for example dose escalation or proof of concept meetings. The requirements on data quality and speed are much greater, while the data complexity remains equal. A while back, the industry started experimenting with applications to utilize the data review tool and data visualization tools instead of asking programmers to produce analysis ready datasets (ADaM) to support dose escalation meeting. However we do not want is for programmers to be siloed or being sidelined during the decision making process. We would like to share our effort on accelerating the speed of data preparation and data visualization for informed decision making in different scenarios. The approach is to utilize the interactive visualization tool Spotfire and other programming languages, such as R and Python. We would like to introduce two applications created using this approach: Real-time data visualization, and sample status tracking report. The value of the first application we provide is to reduce the time spent between data extraction to data visualization. For eCRF data, we can achieve real time data review using interactive visualization tool. In the second application, we extend the approach to earlier stage. This will help the clinical operation team to better understand the sample status, which is usually very time-consuming and tedious work. This is important as it may lead to further delaying the decision-making.

DV-193 : Subset Listings: An Application of Item Stores
Bill Coar, Axio Research

While interactive tools for data review are becoming more common, subject level data listings continue to be a reliable tool. They are sometimes required, as in the case of BIMO listings. A well structured document with hyperlinks and bookmarks can make listings easy to navigate, especially when done by subsets. They can be further enhanced with traffic lighting to highlight records of particular interest. Focus here is on the structure of the document: a single file that contains listings organized by subsets through the use of item stores. An item store is a SAS library member that consists of pieces of information (ie, procedure output) that can be accessed independently. With item stores, procedure output can be created at one point in time and accessed at a later point in time. Item stores are created using ODS Document statements and accessed using Proc Document. The proposed approach is to use by-group processing with Proc Report for each listing. Once the listings are complete, the item stores can be used to create a single document with all data grouped by subset. The proposed technique has been utilized for BIMO listings (site listings) and patient profiles. This presentation will include an introduction to item stores followed by their application using Proc Report. Finally, a single file is created by accessing information for each by-group in each listing independently. Demonstration will be done though a site profile using SAS 9.4 M5 for Windows.

DV-198 : r2rtf – an R Package to Produce Rich Text Format Tables and Figures
Siruo Wang, Johns Hopkins Bloomberg School of Public Health
Keaven Anderson, Merck & Co., Inc., Kenilworth, NJ, USA
Yilong Zhang, Merck & Co., Inc., Kenilworth, NJ, USA

In drug discovery, research and development, the use of open-source R is evolving for study design, data analysis, visualization, and report generation across many fields. The ability to produce customized rich text format (RTF) tables in the R platform becomes crucial to complement analyses. We developed an R package, r2rtf, that standardizes the approach to generate highly customized RTF tables, listings, and figures (TLFs) in RTF format. The r2rtf R package provides flexibility to customize table appearance for table title, subtitle, column header, footnote, and data source. The table size, border type, color, and line width can be adjusted in detail as well as column width, and row height, text format, font size, text color, and alignment, etc. The control of the format can be row or column vectorized by leveraging the vectorization in R. Furthermore, r2rtf provided pagination, section grouping, multiple tables concatenations for complicated table layouts. In this paper, we overview r2rtf workflow with three required and four optional easy-to-use functions. Code examples are provided to create customized RTF tables with highlighted features in drug development.

DV-204 : Static to Dynamic: A one language Programmer to a multi-lingual Programmer with RShiny in Six Weeks
Benjamin Straub, GlaxoSmithKline

As SAS programmers, we are witnessing an industry trend that programmers should become multi-lingual programmers. At the same time, heavier use of data visualization tools in Clinical Programming departments are an industry effort to breathe more life into static reports. Adoption of newer data visualization tools that have a programming language as a base appears to be an industry trend, e.g. R/Shiny uses R while Spotfire allows use of R and Python programs. In order to modernize and transform its approach to data exploration, GSK is developing an interactive data visualization tool to be used by study team members, such as Clinicians and Medical Writers. As the first step in this project, GSK developed an R/Shiny Proof of Concept app. The timelines for the Proof of Concept app were tight with only six weeks, and the learning curve was steep. However, the team was given several resources to aid their efforts as well as some built-in determination. In this paper, I share info on how to use R/Shiny to create dynamic reports from CDISC data as well as learnings about code structure and available training resources. I will also go over the basics of R/Shiny and highlight the essential code needed to come quickly up to speed with this amazing tool.

DV-214 : Producing Lab Shift Tables for Oncology Study
Haihua Kan, FMD K&L
Ziying Chen, FMD K&L

Oncology is a popular area in clinical trials. A special characteristic of an oncology study compared to other clinical studies is that the toxicity grade (CTC grade) is used LB and AE (the CTC grade for LB and AE are different). Programmers usually have been asked to use this CTC grade to create the shift tables. The shift tables can directly display the difference from the baseline to the post-baseline. We will use a detailed example to show the entry-level SAS programmers how to create the LB shift tables under both the hyper and hypo case in Oncology studies step by step. We will also show you how to deal with missing cases and how to calculate the total in shift tables.

DV-215 : Producing a Swimmer Plot for TEAE by using GTL
Ziying Chen, FMD K&L
Haihua Kan, FMD K&L

It is possible to use a swimmer plot to show how a patient develops during the whole treatment. This kind of plot is extremely important for a clinical trial. We will use an extremely detailed example to show how to produce a swimmer plot for Treatment Emergent Adverse Events (TEAE). ADSL, ADAE, ADCM, and ADEX are used to produce the swimmer plot that can display grade 1 to grade 5 of the TEAE duration in different colors. It can display the serious AE too. This swimmer plot will also display the drugs the patients took during the treatment. Thus, from the swimmer plot, statisticians can easily see when the important events occur during the whole duration of treatment, such as TEAE, serious AE, related concomitant drugs taken, and treatment drugs taken.

DV-227 : Safety Reports using PROC STREAM
Samundeeswari Raja, Ephicacy Life Science Pvt Ltd

Maintaining high competency of data without compromising quality is an inevitability in the Pharmaceutical industry. This clinical data is supervised throughout the trial cycle for its ethical reasons and the scientific integrity through various committees. The reporting for these committees is the repetitive work of creating the safety reports mainly. An interactive and dynamic data visualization of tables and listings of the safety reports enhances the understanding and uncomplicates reviewing/validation process. PROC STREAM can be a powerful tool in a statistical programmer’s arsenal since it enables easy facilitation of these properties of the lumpy safety reports. The procedure can be used to produce virtually any text-based content, including but not limited to HTML, CSV, RTF, XML and many more. In this paper, we will be discussing the usage of PROC STREAM procedure for PDF/HTML outputs through simple coding for various types of reports.

DV-252 : A Sassy substitute to represent the longitudinal data – The Lasagna Plot
Soujanya Konda, GCE Solutions Pvt Ltd

Data interpretation becomes complex when the data contains thousands of digits, pages, and variables. Generally, such data is represented in a graphical format. Graphs can be tedious to interpret because of voluminous data, which may include various parameters and/or subjects. Trend analysis is most sought after, to maximize the benefits from a product and minimize the background research such as selection of subjects and so on. Dynamic representation and sorting of visual data will be used for exploratory data analysis. Additionally, data can be represented as per needs to specific subject, a event across the study with specified time-points, combination subject versus any event across the visits or specified time-points with in the visits. The Lasagna plot makes the job easier and represents the data in a pleasant way. This paper explains the Lasagna plot.

DV-282 : Hierarchical Data in Clinical Trials - The Need for Visualization and Possible Solutions
Kishore Kumar Sundaram, Agati Clinical Informatics

“A picture is worth a thousand words”. But with the advent of data analytics and exponential growth of data volume, the saying has evolved to “A picture should be worth more than thousand words”. The amount of data created by pharmaceutical corporations is growing every year and it shows no sign of abating. The problem is, this data is only useful if valuable insights can be extracted from it and acted upon. To do that decision makers need to be able to access, evaluate, comprehend and act on data. Hierarchical data visualization promises a way to be able to do just that. It offers an effective way to: Review large amount of data, Spot trends, Identify correlations and Unexpected relationships. Hierarchical data visualization involves the presentation of data of almost any type in a graphical format that makes it easy to understand and interpret. It enables decision makers to identify correlations or unexpected patterns. Visualizing hierarchical data has long tradition going back to the drawings of medieval family trees. In clinical domain, some of the examples of hierarchical data are Meddra data (SOC-> HLGT-> HLT ->PT->LLT), Summarization of multiple-choice questionnaire forms, Endpoint Adjudication Committee (EAC) decision data etc. In this paper and presentation, we will discuss about the various visual representation of hierarchical data like Circle packing, Sunburst chart, Dendogram, Sankey diagram, etc. that can be generated using SAS and other graphical tools. Their advantages, challenges, tips and tricks in generating these using SAS will be discussed in detailed.

DV-283 : Programming Technique for Line Plots with Superimposed Data Points
Chandana Sudini, Merck & Co., Inc
Bindya Vaswani, Merck & Co., Inc

Line graphs are mainly plotted by connecting associated data points over a specified time interval to portray an overall trend. Despite the existence of many visualization methods and techniques, line plots continue to be a simple way of displaying quantitative data patterns in exploratory analyses. Line plots can be used to present either summary statistics such as mean, standard deviation of a population or individual subject level data over a time period. In a scenario where we need to understand the impact of concomitant medication (CM) on the laboratory measurements (LB) for each subject, presenting line plots with subject level data from those two sources can be extremely challenging, since the y-axis value at the time of CM administration may be unknown. In order to accomplish this, we propose using the properties of a straight line to predict the potential y-value at the time of the CM administration. We can derive these predicted values by programmatically fitting a coordinate between the LB data points, before and after the CM administration (using mathematical concept of slope and constant of straight line). Once the predicted values are calculated, plugging those values into the annotation data step will enable us to generate line plots with superimposed data points, resulting in a meaningful representation of the data being analyzed. This paper details the SAS logic required to generate these line plots.

DV-294 : Common SAS Tips for Patient Profile Programming
Yan Li, Biopier Inc

Patient profile, as indicated by the name, is a report of patient’s clinical data collected from various sources, such as adverse events, concomitant medication, laboratory tests, etc. It provides a more detailed view on patient history and data quality in a more comprehensive and reader friendly format. This article will cover common SAS techniques using SAS Graph Template Language (GTL) combined with Proc Report to: (1) make AE / ConMed / dosing panel; (2) present multiple labs results and their correlation in line chart, grid chart. (3) make universal timeline and scales across multiple panels and space management, etc. It also proposes ways to protect patient confidential information that meets privacy standards, for example the demographics that could be used to identify patients. It introduces ways how to encrypt certain fields for privacy purpose, based on past studies done by us.

DV-299 : Effective Exposure-Response Data Visualization and Report by Combining the Power of R , SAS programming and VBScript
Shuozhi Zuo, Regeneron Pharmaceuticals
Hong Yan, Regeneron Pharmaceuticals

Understanding the relationship between exposure and response is critical to finding a dose that optimally strikes a balance between drug efficacy and adverse events, therefore comprehensive Exposure-Response (ER) analysis are needed throughout all phases of clinical trials. This type of analysis could be planned, ad-hoc or exploratory, it requires high quality data visualization and fast turnaround for dose selection, phase 2 decision or regulatory submissions etc. This paper introduces an innovative efficient way for ER analysis figures by combining the power of R language in data visualization and SAS programming in data processing, dynamic data exchange and statistical procedures. We will use the logistics regression analysis as an example, explaining how the data quality/accuracy are maintained in SAS and connect R function/codes to generate multiple figures by different exposure parameters and endpoints. This paper will also describe the process in detail from input datasets to final outputs including reading SAS datasets into R, Implementing R function like SAS macro, combining ggplot2 package with statistical analysis, reading titles and footnotes from excel sheet for each output automatically, applying RTF package and VBScript in R for both RTF and PDF format and batch run all the R programs at once to update the multiple results.

DV-350 : Dressing Up your SAS/GRAPH® and SG Procedural Output with Templates, Attributes and Annotation
Louise Hadden, Abt Associates Inc.

Enhancing output from SAS/GRAPH®has been the subject of many a SAS® paper over the years, including my own and those written with co-authors. The more recent graphic output from PROC SGPLOT and the recently released PROC SGMAP is often "camera-ready" without any user intervention, but occasionally there is a need for additional customization. SAS/GRAPH is a separate SAS product for which a specific license is required, and newer SAS maps (GfK Geomarketing) are available with a SAS/GRAPH license. In the past, along with SAS/GRAPH maps, all mapping procedures associated with SAS/GRAPH were only available to those with a SAS/GRAPH license. As of SAS 9.4 M6, all relevant mapping procedures have been made available in BASE SAS, which is a rich resource for SAS users. This paper and presentation will explore new opportunities within BASE SAS for creating remarkable graphic output, and compare and contrast techniques in both SAS/GRAPH such as PROC TEMPLATE, PROC GREPLAY, PROC SGRENDER, and GTL, SAS-provided annotation macros and the concept of "ATTRS" in SG procedures. Discussion of the evolution of SG procedures and the myriad possibilities offered by PROC GEOCODE's availability in BASE SAS will be included.

Hands-On Training

HT-091 : Getting Started Creating Clinical Graphs in SAS with the SGPLOT Procedure
Josh Horstman, Nested Loop Consulting

Do you want to create highly-customizable, publication-ready clinical graphs in just minutes using SAS®? This workshop introduces the SGPLOT procedure, which is part of ODS Statistical Graphics, included in Base SAS®. Starting with the basic building blocks, you can construct basic plots and charts in no time. We work through several plot types commonly used in clinical reporting and learn some simple ways to customize each one.

HT-100 : Creating Custom Microsoft Excel Workbooks Using the SAS® Output Delivery System, Part 1
Vince DelGobbo, SAS

This presentation explains how to use Base SAS® software to create custom multi-sheet Microsoft Excel workbooks. You learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output by using the SAS® Output Delivery System (ODS) Report Writing Interface (RWI) and the ODS destination for Excel. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.

HT-111 : YO.Mama is Broke 'Cause YO.Daddy is Missing: Autonomously and Responsibly Responding to Missing or Invalid SAS® Data Sets Through Exception Handling Routines
Troy Hughes, Datmesis Analytics

Exception handling routines describe the processes that can autonomously, proactively, and consistently identify and respond to threats to software reliability, by dynamically shifting process flow and by often notifying stakeholders of perceived threats or failures. Especially where software (including its resultant data products) supports critical infrastructure, has downstream processes, supports dependent users, or must otherwise be robust to failure, comprehensive exception handling can greatly improve software quality and performance. This text introduces Base SAS® defensive programming techniques that identify when data sets are missing, incorrectly formatted, incompletely populated, or otherwise invalid. The use of &SYSCC (system current condition), &SYSERR (system error), and other SAS automatic macro variables is demonstrated, including the programmatic identification of warnings and runtime errors that eliminate the necessity for SAS practitioners to routinely and repeatedly check the SAS log to evaluate software status.

HT-139 : Why You are Using PROC GLM Too Much (and What You Should Be Using Instead)
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Peter Flom, Peter Flom Consulting

It is common knowledge that the general linear model (linear regression and ANOVA) is one of the most commonly used statistical methods. However, the analytical problems that we encounter often violate the assumptions of this model type, leading to its inappropriate implementation. Lucky for us, modern modeling techniques have been created to overcome these violations and provide better results, which has resulted in the development of specialized SAS PROCs to assist with their implementation. These include: Quantile regression, Robust regression, Cubic splines and other forms of splines, Multivariate adaptive regression splines (MARS), Regression trees, Multilevel models, Ridge Regression, LASSO, and Elastic Nets, among other methods. Covered PROCs include QUANTREG, ROBUSTREG, ADAPTIVEREG and MIXED. This workshop will begin with a brief refresher on regression, including a discussion of the assumptions of the GLM and ways of diagnosing violations. It is designed with the assumption that attendees have a working knowledge of linear regression with PROC GLM.

HT-143 : Hands-on Training on NLP Machine Learning Programming
Kevin Lee, Genpact

One of the most popular Machine Learning implementation is Natural Language Processing (NLP). NLP is a Machine Learning application or service which are able to understand human language. Some practical implementations are speech recognition, machine translation and chatbot. Sri, Alexa and Google Home are popular applications whose technologies are based on NLP. Hands-on Training of NLP Machine Learning Programming is intended for statistical programmers and biostatisticians who want to learn how to conduct simple NLP Machine Learning projects. Hands-on NLP training will use the most popular Machine Learning program - Python. The training will also use the most popular Machine Learning platform, Jupyter Notebook/Lab. During hands-on training, programmers will use actual Python codes in Jupyter notebook to run simple NLP Machine Learning projects. In the training, programmers will also get introduced popular NLP Machine Learning packages such as keras, pytorch, nltk, BERT, spacy and others.

HT-375 : Build Popular Clinical Graphs using SAS
Sanjay Matange, SAS, LinkedIn

Survival Plots, Forest Plot, Waterfall Charts and Swimmer Plots are some of the popular, frequently requested graphs in clinical research. These graphs are easy to build with the SGPLOT procedure. Once you understand how SGPLOT works, you can develop a plan, prepare the data as per this plan and then use the right plot statements to create almost any graph. This Hands-on workshop will take you step-by-step through the process needed to create these graphs. You will learn how to analyze the graph and make a plan. Then, put together the data set with all the needed information. Finally, layer the right plot statements in the right order to build the graph. Once you master the process for these graphs, you can use the same process to build almost any other graph. Come and learn how to use SGPLOT procedure like a pro.

HT-376 : Generating Data Analysis Reports with R Markdown
Phil Bowsher, RStudio Inc.
Sean Lopp, RStudio PBC

RStudio will be presenting an overview of R Markdown and how to combine prose, R code, and figures and tables into a nicely formatted and reproducible final report. In this 1-hour session, we will cover the simple R Markdown syntax and explore options for customizing your reports. You will learn the basics of Markdown & knitr, how to add tables for different outputs, workflows for working with data and how to include and style graphics. In this session, we will explore the gt R package, to easily create presentation-ready display tables. Attendees will also learn how to use R code to create tables summarizing participants (i.e., a “Table One”) and statistical analyses within an R Markdown document. Kniting an R Markdown document to various output formats like HTML, PDF, or Word will be covered. RStudio will be showcasing several compelling examples as well as learning resources. This introductory session is targeted at people who work in the clinical space who either don’t know or currently use R Markdown, or perhaps know the basics but aren’t sure how R Markdown can fit into their research workflow. No prior experience with R Markdown is required. As part of the short course, some available research-related R Markdown reports will be illustrated. On the day of the workshop, we will provide you with an RStudio Cloud project that contains all of the course materials.

HT-377 : Step-by-step SQL Procedure
Charu Shankar, SAS Institute

PROC SQL is a powerful query language that can sort, summarize, subset, join and print results all in one step. Users who are continuously improving their analytical processing will benefit from this Hands-on Workshop(HOW). In this HOW, participants will learn the following elements to master PROC SQL: 1. Understand the syntax order in which to submit queries to PROC SQL 2. Internalize the logical order in which PROC SQL processes queries 3. Manage metadata using dictionary tables 4. Join tables using join conditions like inner join and reflexive join 5. Summarize data using boolean operations

Leadership Skills

LS-008 : Are you Ready? Preparing and Planning to Make the Most of your Conference Experience
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

Whether you are a first-time conference attendee or an experienced conference attendee, this paper can help you in getting the most out of your conference experience. As long-standing conference attendees and volunteers, we have found that there are some things that people just don’t think about when planning their conference attendance. In this paper we will discuss helpful tips such as making the appropriate travel arrangements, what to bring, networking and meeting up with friends and colleagues, and how to prepare for your role at the conference. We will also discuss maintaining a workplace presence with your paying job while at the conference.

LS-016 : One Boys’ Dream: Hitting a Homerun in the Bottom of the Ninth Inning
Carey Smoak, S-Cubed

One boys’ dream of hitting a homerun in the bottom of the ninth inning has been realized in my career. My career started out as an epidemiologist in academia. My SAS® skills were pretty basic back then. My SAS skills advanced tremendously as I transitioned to working as a statistical SAS programmer in the pharmaceutical and medical device industries. My career has been varied from strictly working as a statistical SAS programmer to managing statistical SAS programmer. My interest in statistics began with my interest in baseball. Little did I realize that my interest in statistics as a teenager would lead to a fulfilling career and, thus, fulfill my childhood dream.

LS-037 : Microsoft OneNote: A Treasure Box for Managers and Programmers
Jeff Xia, Merck

Microsoft One note has many functionalities that are helpful for managers and lead programmers who have increasing responsibility of managing project deliverables as well as understanding staff availability to ensure quality deliverables with compliance to department SOPs. Additionally, managers and leads have the responsibility to keep upper management informed of the overall status in the group operation, including project progress, success stories, challenges as well as potential problems within the group. To effectively perform their daily operations with so many responsibilities in different directions, it is essential for managers and programming leads to find an efficient and effective way to organize the necessary information in the entire process so that they can locate files and information quickly to resolve any unexpected issues. Microsoft OneNote is a tool that can serve this purpose. This paper briefly introduces some key features of Microsoft OneNote as well as the hierarchy of Notebook, Section and Page within OneNote. It also provides three examples of using OneNote on how to organize information in different categories as the manager or lead programmer of statistical SAS programing group in the pharmaceutical industry.

LS-044 : Leading a successful team through complex environment
Kala Shivalingaiah, VDS

Lately with industry landscape moving towards FSP saving cost and bring global team together under one umbrella there are many challenges from inception onward until we reach the finish line. I would like to share my journey of successful implementation of this model in my recent experience. I would like to dive into few challenges we face within this journey. How do we engage global team? How do you maintain business continuity and quality while handling your stake holder, master, team and many other? Please refer to the outline attached. I plan preparing detail slides with example and stories to share.

LS-059 : An Effective Management Approach for a First-Time Study Lead
Himanshu Patel, Merck & Co.
Jeff Xia, Merck

Becoming a first-time study lead from a programming study team member is a significant step that comes with various challenges. Perseverance plays a substantial role in their development process, which helps to overcome these challenges. Programming of tables, listings, graphs, and datasets is mostly a logical and technical task, whereas leading the study involves additional management tasks. The successful accomplishment of any project is determined by the technical and management approach used by the study lead. Lead can adopt different management approaches, which include technical and leadership skills, effective communication, team motivation, stakeholder relationship, proper planning, and risk management. These factors allow study leads to create a committed and robust team that is dedicated to achieving the best result. This paper illustrates certain key features of an effective management approach that can be implemented to ensure effective and robust study management. This may be relevant and helpful to those hoping to be a study lead or those who recently took study lead responsibilities for the first-time.

LS-118 : Rethinking Programming Assignments
Scott Burroughs, PAREXEL International

Programmers working on a project team in the pharmaceutical industry can reside in all regions of the world, often with little or no work hours overlapping. What happens when deadlines are approaching with much work to be done and the production programmer for a data set or display and the QC programmer work in different regions of the world? Often it means working odd and/or long/extra hours so questions can be answered immediately and more back-and-forth production and QC runs can be made at the same time to get your tasks done on time. But does it have to be this way? Can we re-think this process so people aren’t working so many odd/extra hours to get things done? This paper will examine potential fixes, including findings from an experiment trying to see what works and what doesn’t.

LS-120 : Inspirational Leadership - The Infinite Versus Finite Approach
Jim Baker, Independent Leadership Consultant

Jim Baker - Inspirational Leadership - The Infinite Versus Finite Approach Inspirational leadership is achievable at all levels within an organization and is not limited to CEO's and executives. The finite approach to leadership is managing resources with win / lose scenarios created to measure performance and "worth" of the resource. The infinite approach to leadership focuses on the person and building a trusting relationship where the success of the person results in open communication, career development, and achieving appropriate departmental and corporate goals. Several of the key attributes to develop a long term approach to leadership include: having empathy, being supportive, and believing in the individual. Inspirational leadership is more about ensuring the success of your direct reports and recognizing each person and team for their contributions versus being focused on you. The final section of the presentation and paper emphasizes the importance of mentoring and setting an example - "walking the walk and talking the talk". The ultimate result is delegating more autonomy, decision responsibilty, and leadership opportunity to the managers and as a leadership team. This is a significant portion of creating job satisfaction. The conclusion is, infinite leadership results in trusting relationships where development of people results in higher job satisfaction and career growth. The environment for solving problems and achieving success is positive and evolves ultimately into higher productivity for all.

LS-123 : Schoveing Series 5: Three Question Marks: Your Guide to a Quality NO
Priscilla Gathoni, AstraZeneca, Statistical Programming

Have you ever said NO and felt guilty, ashamed, anxious, frustrated or angry afterwards? Or better still, have you have said NO and walked away feeling calm, poised, balanced, and centered? The two letter word NO can pose both danger and authority when proclaimed, and that is why understanding the various ways of saying NO is crucial. Parenting, marriage, teaching, leadership, managing, recruiting, coaching, mentoring, facilitating, hiring, firing, and many other roles require the mastery of the word NO. The most valuable asset you own is your time, and protecting this valuable commodity requires a tactical and strategic method of delivery when saying NO. Your time on earth is too precious to be wasted with the inability to say NO. This paper will help you unlock 7 ways of saying NO by deprogramming your habitual ways of thinking to achieve greater self-confidence, control, trust, and build healthy relationships. The art of saying NO in a quality way allows you to surround yourself with positive minded people and get rid of toxic people and relationships.

LS-131 : Live Fire Exercises: Observations of a Salty Field Training Officer on How to Better Train and Retain Quality Programmers
Frank Menius, YPrime

After 12-years of fighting crime in North Carolina’s capital city, I decided to put my badge away and start a new career in clinical research. Through this major transition I was fortunate to have a supportive spouse, a company with an internship where I could cut my teeth, and a great manager willing to take a chance on someone with alternative “life experience.” As I have advanced in my new career, I realize just how fortunate I am to be here. Most new hires at other clinical research organizations are not nearly as lucky as I was. Too often programmers are hired and expected to be 100% billable from day one. Training is often limited, consisting only of reading some SOPs before the programmer is expected to rapidly perform by company standards. I call this a live fire exercise: high stress and likely to produce a product with holes in it. This form of hiring is in stark opposition to what I was used to. Police recruits, including those with years of experience, are expected to complete an academy and field training for a year. Not even the chief of police hits the streets on day one. A year of programmer training may not be viable, but bolstering internships, expanding mentorships, job shadowing, and continuous training and evaluation would increase morale, productivity, and contribute to a successful junior programmer pipeline.

LS-162 : Which Side of the Fence Are You on? A Head of Statistical Programming’s perspectives working for Sponsor and CRO
Clio Wu, FMD K&L Inc.

Over the years, more and more pharmaceutical, biotechnology and medical devices companies (Sponsors) have chosen to partner with contract research organizations (CROs) for their clinical trials data management, statistical programming and regulatory submission needs. After 18 years working in the major pharmaceutical and biotechnology industries, and partnering with most of the global CROs, in January of 2019, I decided to expand my horizons to shift my career to the other side of aisle and joined a reputable CRO. This paper will share my perspectives on what to expect working for Sponsors and CRO in a leadership role, the challenges of navigating the Sponsor-CRO relationship, and the best approaches to build a healthy, collaborative, effective and rewarding partnership that benefits both sides.

LS-185 : Leadership Lessons from Start-ups
Siva Ramamoorthy, Ephicacy Lifesciences Analytics

Over the last two decades, we have seen an extraordinary growth of start-ups and start-up ecosystems. In countries like the US, China, and India, ‘billion dollar plus valuation’ unicorns, successful entrepreneur role models, enormous media buzz are center stage. Start-ups are characterized by enormous zeal, about bring a new idea to market and making a difference in the world. Such start-up’s often begin with a single idea or a single solution to an existing problem and with zero investment. Still, they attract hundreds and thousands of dreamy-eyed employees who work for free with a promise of deferred future wealth. The failure rate of such start-ups is as high as 90%, yet employees are happy to take the risk. Besides being strongly motivated, start-up employees are enormously productive and bring innovative ideas at rapid speeds. How is it that start-ups have such passionate committed employees? How is it their employees excel in innovation application? In the life sciences space, we have an equal and perhaps larger opportunity to make a difference in people’s lives. We create solutions that enable better medicines to be created. We have an opportunity like start-ups to get our employees passionately involved in the work by communicating clearly the enormous impact of their work. In this paper I articulate three categories of leadership lessons from the start-up ecosystem to the life sciences industry namely: (i) Systemic Innovation Management -following a winning way of qualifying innovative ideas and building them into products/solutions (ii) People Management Lessons (iii) Velocity

LS-263 : Theory of P and P - Personalized and Predictive Human Resource Management – Evolving face of HRM
Charan Kumar Kuyyamudira Janardhana, Ephicacy Lifescience Analytics Pvt. Ltd.

Human Resource Management is the key function in every organization. Clinical trials and tests can measure the ADME of any drug; however, the psychological factors of we humans employed in any organization has been restricted to theories so far. HRM has been a vital function centrally managed by a set of colleagues through a set of processes. The KPIs, measures have been common across any other function and department. Enter Gen Y or millennial followed by Gen Z – the pattern is evolving. In clinical trials today, we focus on personalized medicine and predictive approach. The same applies to HRM too. The ability to handle pressures, prioritizing tasks and performing to one’s potential has been proved to be different among each of us. This paper focuses on handling every individual as a different subject since the strengths and weaknesses differs among each of us. This leads to personalized and predictive way of managing individuals by their respective managers. The measures include four spheres – job fitment sphere: personalized psychological tests; On-the-job Sphere: monitoring behavioral trends, interaction with fellow colleagues and yearn to learn; Personal sphere: satisfied and healthy work-life balance; Directional sphere: the goals of an individual which leads to his/her success and happiness. All these spheres give birth to the theory of P and P i.e., Personalized and Predictive HRM which is crucial in creating a satisfied workforce who will coherently contribute to the culture of an organization. Possible ways to implement such a change and challenges are discussed.

LS-265 : Building a Strong Remote Working Culture – Statistical Programmers Viewpoint
Ravi Kankipati, Nola Services
Prasanna Sondur, Ephicacy LifeScience Analytics

Remote Working or Telecommuting has become more of a norm than exception. To add a dimension of off shore Functional Service Provider who extend this to their employees whilst maintain the status quo is a challenge in itself. Companies situated offshore are embracing this feature of flexible hiring by casting a wider net or to retain the existing talent. Only a handful of companies in the Statistical Programming Sphere have been successful in having a Full Remote workforce. Statistical Programming is a diverse field involving frequent discussions with fellow Programmers and Client Point of Contacts (PoCs) within the Statement of Work (SOW). A strong liaising with Data management staff, Statisticians and Medical writers is expected. Remote working thus throws a challenge for effective communication within & between global teams. Some examples being: Communicating timelines of each deliverable, data issues, validation comments and work transitions. The authors provide their thoughts of overcoming such challenges and in building a strong, transparent working culture. Other common issues like Upgrading skill sets, time management, work-life balance are also discussed. Authors with their own diverse experience, One with majority of experience at Onsite and the other from Offshore share their own candid experience of their Statistical Programmer journey in the current Remote working realm.

LS-272 : Leadership Skills: Tips to Retain and Motivate Millennials at Work
Richard DAmato, PPD
Ajay Gupta, PPD

Whatever your opinions are of millennials, they are rapidly becoming a growing and important part of the workforce, and they work and think very differently than the generations before them. For instance, they often use technology to work more efficiently, and their desire to understand the “why” behind everything can create opportunities for your company to refocus on doing the right things to meet big-picture goals. Many millennials don’t value titles or appearances. They don’t need to wear nice suits or work in fancy offices. They are not very interested in the organizational flowcharts that allow employees to move up the ladder or on a set schedule. In the past, workers found validation in climbing the corporate ladder, receiving a new job title and office and staying loyal to the same company for many years. Millennials find very little value in any of this. This can be operationally disruptive to an organization, and frustrating for the manager trying to motivate and lead a millennial to work with a team and perform. The phrase “millennials in the workplace” may bring to mind open work spaces with bean bag chairs, organic kale salad bars and Starbucks coffee in the break room and in-office yoga. It may bring up thoughts of cell phones, social media and remote work. Or, it may bring up stereotypes around entitlement, laziness and an inability to commit to a job or a company. Experts predict that by 2025, three-fourths of workers are going to be millennials.

LS-297 : The Art of Work Life Balance.
Darpreet Kaur, Statistical Programmer

We are the healthcare industry, engaged in putting the pieces together to make this world a little more healthier, a little more happier. In the process, we develop lifestyles that might not be the best for our own health and happiness. What can we,as professionals do and what steps can employers take to ensure a healthy and happy workforce? How can employers offer better work life balance. 1-Access to exercise/pay for Gym memberships 2-Work from remote/flexible hours 3-Create opportunities for casual mingling. 4-Offer good healthcare and retirement benefits. 5-Continuing Education and learning opportunities. 6-Offer volunteering opportunities/community engagements. 7-Team building events. 8-Mandatory vacation policy. 9-Gratitude towards the employees and encouragement to do better. 10-Regular one on one meetings to address any challenges at work. How can employees ensure better work life balance: 1-Set realistic goals and priorities. 2-Take time off for physical and mental needs. 3-Make physical, mental and emotional health a priority. 4-Exercise and eat healthy. Drink enough water. 5-Take short breaks to improve productivity. 6-Give social media and technology a break. 7-Set boundaries and work hours. 8-Ask for help if you are stuck at work. 9-Invest in kitchen gadgets that save time. 10-Reconsider your job profile. There is no definition of an ideal work life balance. We all have different goals and different expectations from our lives and work, so my idea of work life balance is different from yours. Balance is a very personal thing and only you can decide what lifestyle choices suit you the best.

LS-309 : Leadership from the point of view of individual contributor
Aravind Gajula, PPD, Inc

Relationship between leadership team and individual contributors is key to the success of an organization. Both parties need to work together to establish a strong relationship of mutual respect, mutual understanding, and mutual support. Leaders are trained on how to manage, motivate, encourage and accomplish the assigned tasks from individual contributors. Leaders may have knowledge and understanding of organization priorities & requirements and they work to make sure individual contributors are aware of these priorities and support them. Many times, the approach of top down implementation of priorities can have negative impact and develop trust issues between leaders and individual contributors. It is important to understand and evaluate how individual contributors see the leadership and their approach to lead them. Individual contributors, many times, have different opinion of how leaders should behave and contribute to success. The authors will elaborate on the importance of building a good relationship and share the tips on how to build it, from individual contributor point of view. Target Audience would be anyone with any skill level.

LS-320 : For My Next Act… Leveraging One’s Skills to Start Anew in Pharma / Biotech
David Polus, TalentMine

After a false start in the Automotive industry, I have spent over thirty year in Pharma / Biotech. As Dad used to say, “son, you made the right call.” After 16 years of being the best programmer in the industry, someone had the bright idea to move me to management. This was fun for the next 14 years, but if I was to make it another 15 years in the industry, I had to try something new. Over the course of my career, I’ve seen my colleagues take many different career paths. Some moved from programming to technical leadership and thrived. Others took the path I did and cut themselves off from programming altogether. Still many programmers are more than satisfied to remain technical, adding new languages and skills to their arsenal to stay relevant in the industry. It was one individual that moved from programming to management to the Dark Side of Sales that truly intrigued me, causing me to consider if there might be something more out there. I’ll tell you the story of Mindy Kiss (a real person) and the lessons I learned along the way to becoming a recruiter after thirty years. Outline a) Intro b) Loving one’s work c) Becoming the best d) Understanding oneself e) Career decision tree f) Small, logical moves g) Large, illogical leaps h) Next steps

LS-322 : Strategies to building, growing and maintaining a high-performance global statistical programming team
Amanda Truesdale, Veristat
Bill Donovan, Wright Avenue Partners

The paper will review best practices and share specific examples of how a company can work to more effectively attract, hire, train and maintain a global statistical programming team. Over the last 10 years Veristat has worked to build a global team from its initial modest staff of statistical programmers located in a central office. The growth of resources was achieved with a set of core strategies and a variety of methods to create and grow a high-performance global programming team. The Veristat approach and overall strategic direction to grow and expand were (are) focused on five specific areas for the Biometrics Group and will be outlined based on the following below: Culture, Strategic & Scalable work teams, Talent & resource Management, Team & Organizational Model Management, and Project Management. Veristat successfully developed and handled vendor management to increase staff while integrating remote and off-shore resources models.

LS-330 : Leadership and Programming
Kriss Harris, SAS Specialists Ltd

The world is looking for leaders, and you can be a leader! As a Programmer or Statistician you have the power to lead and influence the clinical study, and essentially, you have the power to improve the quality of people’s lives. As a leader, you need to be able to step forward and put your hand up, although this can be a challenge because as Programmers and Statisticians, a lot of us can be introverted and live in our head more, and shy away from taking the lead. The message here isn’t implying that introversion is something that needs to be cured. The message here as quoted by Susan Cain is merely implying that “Introverts need to trust their gut and share their ideas as powerfully as they can. This does not mean aping extroverts; ideas can be shared quietly, they can be communicated in writing, they can be packaged into highly produced lectures, they can be advanced by allies.” Whether you are introverted or not, this presentation will help you to lead yourself, lead others, obtain your goals and ultimately help people live better quality lives. I’ll be sharing with you my learnings from the Leadership Academy training which I took in 2019, and my other personal development learnings.

LS-359 : Leading without Authority: Leadership At All Levels
Janette Garner, MyoKardia, Inc.

The idea of what it means to be a leader takes on different meanings for each person and can vary across situations. One image is that of a manager directing a team under tight timelines. Another example would be a team member who proposes a solution to a novel problem. At its core, leadership is the process of driving influence from others to achieve a goal. This does not require seniority, organizational hierarchy, titles or personal attributes, though these are often conflated. Anyone can be a leader, including those without direct reports. This paper examines different leadership styles and contains numerous examples of how an individual contributor can make an impact in an organization, both large and small.

LS-364 : Project Metrics- a powerful tool that supports workload management and resource planning for Biostats & Programming department.
Jian Hua (Daniel) Huang, BMS
Rajan Vohra, BMS
Andy Chopra, BMS

For pharmaceutical company which has a large number of pipelines (i.e. several hundreds of studies ongoing) and a big group of Biostats & Programming team (i.e. > 200 people), it could be challenging for its management team to do workload management and resources planning efficiently. My team creates a “project metrics” which is aimed to collect Biostats & Programming workload related information from a centralized location. The metrics contains information of work deliverables which are organized under following categories: Function, TA Group, Lead, Compound, Study and Deliverable. It provides pre-defined pull-down lists (i.e. compound list, type of deliverables) to ensure the consistency of data entry. In addition, it creates customized dashboard reports that give management team a comprehensive overview and great details of related information, those reports include study summary, resources predication, a 360-degree dashboard report and a Sankey diagram of workload distribution. Finally, some new feature development has been fully discussed in the end of paper. In a quick summary, the project metrics provides a centralized location for data collection and a powerful tool for summary report. It increases the feasibility and efficiency for management team of conducting workload management and resources planning across Biostats & Programming department.

LS-373 : Managing Transitions so Your Life is Easier
Kirsty Lauderdale, Clinical Solutions Group
Paul Slagle, Clinical Solutions Group

Transitions can be a very troublesome time for everyone involved; managers, programmers, and clients. It seems like a very simple thing to manage however people still struggle with programming transitions which leaves a huge knowledge gap when a programmer departs the team. Effectively managing the information flow during a transition makes life much simpler, for both the transitioner and the transitionee. The most common programming complaint from programmers taking over code is that there are chunks of code that do "things" that the rest of the team is not quite sure of. Sometimes the code performs magic that nobody else on the team actually understands. Management is also normally tasked with understanding what is the actual status and how much time it will take for the new person on the project to take over. All of this adds a high level of stress to what often is a project already on a tight timeline. This paper will lay out these challenges and then present an effective transition plan which covers both the management expectations for the transition as well as the expectations of the programmers. This can be applied to single programs, full studies, and across portfolios.

Medical Devices

MD-020 : CDISC Standards for Medical Devices: Historical Perspective and Current Status
Carey Smoak, S-Cubed

Work on SDTM domains for medical devices was begun in 2006. Seven SDTM domains were published in 2012 to accommodate medical device data. Minor updates to these seven SDTM domains were published in 2018. These seven SDTM domains are intended for use by medical device companies in getting their products approved/cleared by regulatory authorities and for use by pharmaceutical companies to put ancillary device data. As evidenced by the Therapeutic Area User Guides, pharmaceutical companies are using these seven SDTM domains for ancillary devices. However, adoption of these seven SDTM domains by medical device companies seems to be happening rather slowly. In 2014, a statistician at the Centers for Radiologic Health and Devices (CDRH) presented the issues that CDRH has with medical device submission and in 2015 the CDISC medical device team presented how the CDRH issues could be solved with CDISC standards. Recently CDRH published a document titled ‘Providing Regulatory Submissions for Medical Devices in Electronic Format.’ In 1999, the FDA published a similar document for pharmaceutical products which was the beginning of the development of CDISC standards for the pharmaceutical industry. While CDRH has not made a statement that they are moving towards the requirement of CDISC standards for medical device submissions the publishing of this document is a step in the right direction.

MD-041 : Successful US Submission of Medical Device Clinical Trial using CDISC
Phil Hall, Edwards Lifesciences

It is not yet mandatory for medical device trial data to be submitted using CDISC but The Center for Devices and Radiological Health (CDRH) accepts clinical trial data in any format, including CDISC. This paper serves as a case-study of the successful FDA submission of the Edwards Lifesciences’ PARTNER 3 trial which utilized SDTMS and ADaMs. There will be a review of the SDTM domains used for medical device-specific data and a general discussion of the submission approach.

MD-360 : An overview of medical device data standards and analytics
Shilpakala Vasudevan, Ephicacy Lifescience Analytics

Medical device trials are widely conducted in recent times, for different therapeutic and diagnostic reasons. The nature of device trials is different from traditional clinical trials, and involve different study design, ways of collecting and data standards. Over the last few years, SDTM domains have been identified to accommodate data for devices, which can be used alongside with other regular domains usually found in clinical trials. Additionally, there have been analysis data set proposals that are being currently used. In this paper, we will see how medical device clinical trials are different, what data standards can be used for different use cases and examples, and what potential improvements we can look for in the future.

Quick Tips

QT-035 : A SAS Macro for Calculating Confidence Limits and P-values Under Simon’s Two-Stage Design
Alex Karanevich, EMB Statistical Solutions
Michael Ames, EMB Statistical Solutions

Simon’s two-stage designs are popular single-arm binary-endpoint clinical trials that includes a single interim analysis for futility. One is typically interested in the proportion of successes (response rate), but SAS does not provide p-values or confidence limits for this statistic. Calculating these is not straightforward due to the interim analysis: one is required to make a multiplicity adjustment for any trial that proceeds beyond the first stage. This quick tip provides derivations (along with a SAS macro) for calculating such a trial’s p-value and associated confidence limits. The macro requires the user to input the design parameters for the trial, as well as the total observed successes.

QT-038 : Programmer guild for missing data handling
Jianguo Li, L&L Clinical Research Group

Abstract This paper summarizes the basic SAS techniques to identify the missing values. The missing handling options for common SaS procedures are also summarized. Missing data handling is very important topics in clinic trail. Single imputation and multiple imputations are discussed.

QT-049 : PROC COMPARE: Misnomer of the statement “NOTE: No unequal values were found. All values compared are exactly equal.”
Alex Ostrowski, Pfizer

When validating SAS code and datasets, it is common to use PROC COMPARE and receive the following statement “NOTE: No unequal values were found. All values compared are exactly equal.” The logical interpretation of this statement is the datasets are “exactly equal” and there are “no unequal values” between the two datasets. However, this is not always the case. As such, there is a risk if this is the primary method of providing equivalence in data sources, key differences may be overlooked. This paper will explore how to detect and fix these differences by comparing datasets with PROC SQL. With this method, the datasets are extensively compared and will only generate an “exactly equal” result if the datasets are indeed equivalent. In the case the datasets are not equivalent, differences are logged into “major” and “minor” groups and are summarized in the output. This method also lists the detailed differences between datasets by individual variable indicating changes in label, type, order, length, format, and informat. This provides a solution to avoid being misled by PROC COMPARE while helping make an improved assessment of “data equivalence” by performing additional comparisons and clearly stating the comparison results, allowing programmers to accurately compare datasets with confidence.

QT-127 : Remove Strikethrough Texts from Excel Documents by VBA Macro
Li Liu, Merck & Co., Inc at Upper Gwynedd PA

This paper focuses on removing strikethrough text embedded in Excel Spreadsheet documents programmatically using VBA Macro. Data Definition document (i.e., define.xml) is a critical part of FDA Submissions. ADaM specifications in Excel Spreadsheet format is the source document to create the Analysis Define.xml document. Programmers must clean up the ADaM specifications by removing irrelevant text including strikethrough text embedded in Excel Spreadsheet documents for tracking. Manually removing strikethrough text from an Excel Spreadsheet is a cumbersome task to do and prone to errors. We must apply extra caution to make sure the entire strikethrough text removal is accurate especially when this process is done manually. Also, the user must make sure that normal text remains and is not deleted by mistake. VBA Macro tool presented in this paper removes strikethrough text by scanning the text in each cell of every sheet in an Excel spreadsheet and performing the cleanup of strikethrough text throughout the entire workbook automatically. It is an especially handy tool for programmers to use when generating high-quality data specifications for creating the Data Definition document.

QT-128 : A Solution to Look-Ahead Observations
Yongjiang (Jerry) Xu, CSL Behring
Yanhua (Katie) Yu, PRA Health Sciences

In clinical trials, it is common to compare certain laboratory test results across a period or various time points. For example, the confirmation of drug responses in multiple myeloma studies and some hematology indicators emerging back to safe level from nadir. These data are usually mapped into finding domains in SDTM and programmed into BDS structure type ADaM datasets. However, it is difficult to look ahead the values of future observations or look back the values from previous observations in SAS® because SAS reads one observation at a time into the PDV. It is a known issue in SAS data management that we cannot do comparisons across observations. This paper will present a practical solution to transpose and do data comparison and calculations cross observations using DO loop and Array. Software Used: SAS v9.4. Operation system: Windows. Audience: Intermediate skill level.

QT-167 : Transpose Data from Wide to Long With Output Statements
Lilyana Gross, Precision for Medicine

Within SAS, PROC TRANSPOSE can be a powerful tool. Though, when working with a combination of character and numeric variables, missing values, and multiple identifiers, transposing data from wide to long can be cumbersome and unintuitive. This paper will expand on the process of using output statements within a DATA step to transpose data from wide to long. Using output statements within a DATA step allows programmers to customize variables names, lengths, and create new variables based on the desired outcome of the transpose. In this paper, we will demonstrate the strength of this method with an example SDTM.LB created from laboratory data collected by visit, in wide format. This is a quick tip that is friendly to beginner as well as senior programmers alike.

QT-178 : Python applications in clinical data management
Jianlin Li, Q2 Business Intelligence

Programming is involved extensively in clinical data management, from offline edit check to data reconciliation, from query status reporting to trial status monitoring. As a modern general purpose and high-level programming language, python is very powerful in data manipulation. This paper presents a few python applications to improve the clinical data management functionality: (1) mapping internal database specification to EDC system specification. (2) query status report from audit trails. (3) data listings from raw datasets for selected subjects, selected domains and selected variables. SAS macros are created to generate input parameters and configuration files for Python programs. This enables SAS programmers, without knowing Python languages, to finish tasks by simply calling SAS macros. With its simple syntax rules and readability, Python has more potential applications that could make clinical data managers’ lives easier.

QT-183 : A Brief Understanding of DOSUBL beyond CALL EXECUTE
Ajay Sinha, Novartis
M Chanukya Samrat, Novartis

SAS® is one of the most widely used programming language in clinical space, the newer release of SAS® 9.3 R2 and above offers a function “DOSUBL” that can come very handy for programmers to shorten and make code more efficient. This paper intends to explain the use of DOSUBL function with help of some examples that can make code more robust and flash the best use of this function within the code. Function DOSUBL enables immediate execution of SAS code after a text string is passed. It is somewhat similar to CALL EXECUTE routine however differs significantly, this paper intends to present the merits and pitfalls of DOSUBL function and where it can come handy to make most use of this function. The DOSUBL function has a single argument, which is a string value. In a data step, the function submits code to SAS for immediate execution. DOSUBL should be used in a DATA step. SAS documentation states that this function can also be used with %SYSFUNC outside a step boundary. DOSUBL executes code in a different SAS executive (known as a side session). This function is comparatively slower than CALL EXECUTE subroutine and uses more CPU resources however, the main advantage of using DOSUBL is immediate execution of the code in the side session.

QT-194 : Addressing (Non) Repeating Order Variables in Proc Report
Bill Coar, Axio Research

In creating summary tables or listings of raw data, Proc Report is a commonly used within the pharmaceutical industry. Summary statistics are often placed in a reporting dataset to be displayed using Proc Report. In doing so, it is extremely common for the first few columns to be ORDER variables. While this helps with sorting, another reason for using an ORDER variable is so that the value only appears on the first record when displaying multiple rows with the same value of the ORDER variable. Furthermore, the value repeats on subsequent pages when the records (with the same ORDER value) span multiple pages. However, when using an ODS destination this is not always the case. This quick tip address non-repeating order variables as it applies to RTF and PDF ODS destinations. It will assume that users have a (very) basic understanding of Proc Report and the use of ODS destinations of RTF and PDF. The use of ODS is required in this application using SAS® 9.4 in a Windows environment.

QT-209 : Automation of Conversion of SAS Programs to Text files
Sachin Aggarwal, Rang Technologies
Sapan Shah, Rang Technologies

Statistical Programmers in clinical trials develop SAS programs to produce SDTM & ADaM datasets, tables, listings and graphs output according to study requirements. Apart from these SAS programs, there are other programs, which are developed during various ad-hoc and post-hoc requests. These programs have the same file extension .SAS and thus can be used on various SAS platforms e.g. SAS desktop, SAS Enterprise guide, etc. The .SAS extension is a requirement for a program to be functional on various SAS platforms. Whereas, the FDA requires all SAS programs to be in the text file and not .SAS extension for NDA submission. The Macro, which we are going to present in this paper, can convert multiple SAS programs saved in one folder to text files in a single run without making any changes to original SAS programs. This SAS to TEXT Macro will help saving programmer enormous amount of time by eliminating manual conversion of every single SAS file. It will also eradicate any possibility of error, which may occur if you manually convert SAS files to TEXT files. This Macro follows a process, which does not make any change to the original SAS files present in the original folder. This Macro reads & copies .SAS files from the original folder and pastes it to the second folder, where it converts copied .SAS files to .TXT files. After this, it deletes the copied .SAS files present in the second folder and leave only SAS programs in .TXT format as an output.

QT-213 : A SAS Macro for Dynamic Assignment of Page Numbers
Manohar Modem, Cytel
Bhavana Bommisetty, Vita Data Sciences

In clinical domain we usually create many safety and efficacy tables with various statistics. While creating these tables, the dataset with statistics is introduced into proc report to create rtf output. Using Proc Report-BREAK-PAGE we can make sure that each unique value of a parameter starts in a new page. If we want to make sure that a group of statistics does not break abruptly between pages, we may need to use conditional statements to assign page numbers. Whenever data or table shell gets updated, the number of records in the dataset with statistics may change which in turn requires an update in conditional statements to prevent abrupt breaks in the output. This led to an effort to create a macro such that it can be used for any table with simple modifications to macro parameters. The purpose of this paper is to describe how the page numbers were dynamically assigned using SAS macro.

QT-249 : Text Wrangling with Regular Expressions: A Short Practical Introduction
Noory Kim, SDC

Regular expressions provide a powerful way to find and replace patterns in text, but their syntax can seem intimidating at first. This paper presents a few simple practical examples of adapting text from data set or TLF specifications for insertion into SAS programs, using the text editor Notepad++. This paper is intended for SAS users of any skill level. No prior knowledge of regular expressions is needed.

QT-253 : Implementing a LEAD Function for Observations in a SAS DATA Step
Timothy Harrington, Navitas Data Sciences

A common situation in DATA step processing is the need to reference the value of a variable (column) in a prior or later observation. The SAS system provides the functions LAG and DIFF to return the value of the variable in the prior observation or the difference between the current value of the variable, in the Program Data Vector (PDV), and the prior value. LAGn and DIFFn refer to the nth prior value, where 1<=n<=100 and must not refer to before the first observation (_N_=1) in the dataset. However, there is no corresponding LEAD function which looks at values in observations still to be read into the PDV. This paper demonstrates three different methods of implementing a LEAD function functionality. The modus operandi of each method is illustrated with examples of SAS code and the advantages and disadvantages of each method are discussed, as is the suitability of each method for specific types of programming situations.

QT-280 : A Macro to Add SDTM Supplemental Domain to Standard Domain
Pratap Kunwar, EMMES

Many pharmaceutical and biotechnology industries are now preferring to set-up Study Data Tabulation Model (SDTM) mapping in the beginning of the study rather than at the end, and use SDTM datasets to streamline the flow of data from collection through submission. When you have SDTM datasets at your disposal, it is a logical choice to use them for any clinical reports. Getting information from the supplemental (SUPP) domain back to the parent domain is a regular step that programmers can't avoid. But, this step can be very tricky when either (1) SUPP domain contains multiple types of identifying variables, or (2) SUPP domain is empty or does not exist. In this presentation, I will present an easily understandable macro that will produce correct results in every possible scenario.

QT-289 : Highlight changes: An extension to PROC COMPARE
Abhinav Srivastva, Gilead Sciences

Although version control on the files, datasets or any document can be challenging, COMPARE® Procedure provides an easy way to compare two files and indicate the differences between them. The paper utilizes the comparison results from PROC COMPARE® and builds it into a SAS® macro to highlight changes between files in terms of addition, deletion or an update to a record in a convenient excel format. Some common examples where this utility can be useful is comparing CDISC Controlled Terminology (CT) release versions, comparing Medical dictionary versions like MedDRA, WHODrug, or comparing certain Case Report Form (CRF) data like Adverse Events (AE) to review new events being reported at various timepoints for data monitoring purposes.

QT-313 : PROC REPORT – Land of the Missing OBS Column
Ray Pass, Forma Therapeutics

So whatever happened to the OBS column in PROC REPORT, the one that you get for free in PROC PRINT? Well, stop looking for the option to turn it on because it’s not there. The missing column can however be generated pretty painlessly, and learning how to do so also serves to teach you about an important distinction between two very basic types of variables used in PROC REPORT, namely “report variables” and “DATA step variables”. It’ll only take a few lines of code and a few minutes of time, so pay attention and don’t blink.

QT-315 : A SAS macro for tracking the Status of Table, Figure and Listing (TFL) Programming
Yuping Wu, PRA Health Science
Sayeed Nadim, PRA Health Science

A clinical study report (CSR) typically involves in generation of hundreds of TFLs. To efficiently manage such large batch of outputs, Lead programmers and managers often need to know the current status of each programs and outputs. Many organizations may use an Excel based tracking document where production and validation programmers can enter the TFL status. However, for a CSR delivery that usually takes several months to prepare, such document is often out of status due to ADaM updating, miscommunication between production and validation programmers, etc.. This paper introduces a tool that can efficiently monitor the current status of the TFL outputs for their logs, PROC COMPARE, dataset and output creation.

QT-345 : What are PRX Functions and Call Routines and examples of their application in Clinical SAS Programming
Sarbani Roy, Alnylam

PRX, which stands for Perl Regular Expression, is a family of functions and call routines that were introduced in SAS version 9. Many SAS users are unfamiliar with SAS regular expressions (the RX functions) and Perl Regular Expressions (the PRX functions). This tool is very useful and flexible because it enables you to locate patterns in text strings, and it sometimes provides a much more compact solution to a complicated string manipulation task. PRX function can be used replace many of the traditional functions like Index, Tranwrd, Compress etc. and the Like operator. Additionally, it enables us to address more sophisticated issues. It can be used extensively in Clinical SAS programming for simplified string manipulation in different SAS domains including Concomitant Medications, Adverse Events and Medical History. This paper gives an overview of the PRX functions and call routines and illustrates with examples how they can be used by Clinical statistical programmers for SDTM, ADaM and TLF programming.

QT-349 : Macro for controlling Page Break options for Summarizing Data using Proc Report
Sachin Aggarwal, Rang Technologies
Sapan Shah, Rang Technologies

In Statistical Programming when Statistical Programmers use Proc Report to generate various Summary Tables and Listings, they face a number of challenges related to Summary reports. Given below are some of these challenges: 1.) Printing certain number of non-missing lines on one page 2.) New parameter coming at the end of a page 3.) Starting different parameters on different pages 4.) Combining the groups and sub-groups on one page 5.) Continuation of the main group heading in case the splitting of the main group and subgroup is unavoidable, example- adverse events 6.) Addition of suffix in case of the group splitting to a different page This paper will present Page Break Macro, which will provide a solution to the above-mentioned challenges faced by the Statistical Programmers while generating Summary reports. This macro utilizes basic programming logic to generate the SAS dataset. These output SAS datasets are then used in Proc Report to generate Summary reports as per the Clinical Study requirement. When this Macro is used in combination with Existing Reporting Macros in the Pharmaceutical companies will be quite helpful in cutting down the programming time wasted in the formatting of Tables and Listings.

QT-354 : Updates to A Macro to Automatically Flag Baseline SDTM
Taylor Markway, SCRI Development Inovations

This paper takes the macro published in PharmaSUG 2016 – Paper QT19: A Macro to Automatically Flag Baseline in SDTM and updates it for the changes to SDTMIG v3.3 as well as further refinement to the programming. The SDTMIG v3.3 introduced the new --LOBXFL variable and has made the former --BLFL variable permissible where is was previously expected. By checking the input dataset and following SDTM assumptions the macro still only requires an input and output dataset, but allows optional parameters to be manually set if adjustments are needed for special cases. This paper also highlights the code changes from the prior to the current code and the lessons learned are shared.

Real World Evidence and Big Data

RW-053 : NHANES Dietary Supplement component: a parallel programming project
Jayanth Iyengar, Data Systems Consultants LLC

The National Health and Nutrition Examination Survey (NHANES) contains many sections and components which report on and assess the nation's health status. A team of IT specialists and computer systems analysts handle data processing, quality control, and quality assurance for the survey. The most complex section of NHANES is dietary supplements, from which five publicly released data sets are derived. Because of its complexity, the Dietary Supplements section is assigned to two SAS programmers who are responsible for completing the project independently. This paper reviews the process for producing the Dietary Supplements section of NHANES, a parallel programming project, conducted by the National Center for Health Statistics, a center of the Centers for Disease Control (CDC)

RW-113 : Better to Be Mocked Than Half-Cocked: Data Mocking Methods to Support Functional and Performance Testing of SAS® Software
Troy Hughes, Datmesis Analytics

Data mocking refers to the practice of manufacturing data that can be used in software functional and performance testing, including both load testing and stress testing. Mocked data are not production or “real” data, in that they do not abstract some real-world construct, but are considered to be sufficiently similar (to production data) to demonstrate how software would function and perform in a production environment. Data mocking is commonly employed during software development and testing phases and is especially useful where production data may be sensitive or where it may be infeasible to import production data into a non-production environment. This text introduces the MOCKDATA SAS® macro, which creates mock data sets and/or text files for which SAS practitioners can vary (through parameterization) the number of observations, number of unique observations, randomization of observation order, number of character variables, length of character variables, number of numeric variables, highest numeric value, percentage of variables that have data, and whether character and/or numeric index variables (which cannot be missing) exist. An example implements MOCKDATA to compare the input/output (I/O) processing performance of SAS data sets and flat files, demonstrating the clear performance advantages of processing SAS data sets in lieu of text files.

RW-161 : Visualization of Big Data Generating from Real-Time Monitoring of Clinical Trial
Zhouming(Victor) Sun, Astrazeneca

With the rapid and inevitable innovations in digital health and personalized medicine, censoring data in clinical trials will enhance the scientific value and analysis precision. However real-time monitoring data can be difficult to process, analyze, and interpret due to its volume. Graphical visualization of big data is powerful in terms of generating meaningful insights and clarification that otherwise cannot be easily understood using traditional data outputs. This paper introduces different approaches to graphical representation on overweight and obese patients wearing activity monitoring wristbands over the course of an example trial. Weights are measured daily vs. real-time activity monitoring (e.g., exercise, steps / movements, energy expenditure) in every minute for three months. The generated graphical visualizations show how weight levels are affected by daily activities and energy consumption

RW-192 : Natural History Study – A Gateway to Treat Rare Disease
Tabassum Ambia, Alnylam Pharmaceuticals, Inc

Natural History Study is a study that follows a group of people over time who have, or are at risk of developing, a specific medical condition or disease. Natural history study bears significant importance in the discovery, marketing and post-marketing phases of a drug. There are different types of natural history studies which may help to determine the requirement of the adequate treatment in a target population or to assess the outcome of a treatment in real life. Real world evidence (RWE) data are the primary source of health information to be used in a natural history study. Statistical analysis is focused on both incidence and prevalence of the disease involving different procedures to determine the distribution of characteristics or events over a certain time and correlation with covariates. A rare disease is the one which affects a very small percentage of population. Recently FDA has emphasized the importance of natural history study for the development of orphan drugs for rare diseases. Natural History Studies can also help the research to treat non-rare disease. This paper will discuss the basics of natural history study in line with rare disease and orphan drugs, real-world evidence data for natural history study, types of natural history study with analysis techniques and limitation, difficulties during the development of orphan drugs which can be mitigated through natural history studies and FDA recommendation for using natural history studies in RCTs to develop the treatments of rare disease.

RW-275 : Simple uses of PROC CLUSTER: Pruning big datasets and identifying data issues
Valentina Aguilar, Syneos Health

As part of our programming activities of SDTM and ADaM datasets we often use PROC FREQ for categorical variables and PROC MEANS for continuous variables to check controlled terminology or find data anomalies. The clustering SAS capabilities can be highly useful as well. In this paper I present two uses of PROC CLUSTER. The first is an example of using the output dataset of proc cluster for quality control. In the second example I present a case where a large dataset from a wearable tracking position is pruned using proc cluster to a smaller size while keeping relevant information of the whole.

RW-319 : Standardizing Laboratory Data From Diverse RWD to Enable Meaningful Assessments of Drug Safety and Effectiveness
Irene Cosmatos, UBC
Michael Bulgrien, UBC

Laboratory results available in electronic medical record (EMR) data are becoming increasingly critical for the assessment of drug safety and effectiveness. However, unlike diagnoses and medications where vocabularies for data capture are well established, the recording of laboratory results in EMR databases is often not standardized, and differences occur across and even within EMR systems. This heterogeneity can create significant challenges for epidemiologic analyses requiring cohort and/or outcome definitions based on lab criteria. This project standardized diverse laboratory data from US and non-US EMR databases into a common structure to enable analyses to be compared across data sources that do not use a standardized coding system such as Logical Observation Identifiers Names and Codes (LOINC). UBC’s database analysts and clinicians developed an approach to transform laboratory results from diverse RWD into a cohesive and accurate dataset based on standardized units and test names, while minimizing loss of data. Close clinical scrutiny and unit conversions were required to enable a common data structure. The effort focused on 3 liver function tests: ALT, AST and Total Bilirubin, using 3 data sources: 2 US and 1 EU. UBC will discuss our analytic and clinical challenges, such as differences between the EU and US in naming conventions of laboratory tests, and their resolutions. Results of this data standardization effort will be demonstrated, highlighting key issues that impact defining a study cohort or outcome based on quantitative lab criteria. This presentation will be relevant to any user interested in standardizing vocabularies used in RWD.

RW-372 : Life After Drug Approval… What Programmers Need to Know About REMS
Cara Lacson, United BioSource Corporation
Carol Matthews, UBC

While most of the spotlight in the drug development process focuses on clinical trials and the effort to get drugs approved, FDA is turning to Risk Evaluation and Mitigation Strategy (REMS) as a way to approve drugs they may have safety concerns about while closely monitoring those potential safety issues once the drug is approved. A REMS often involves drugs that have a high risk for specific adverse events, and the FDA requires manufacturers to put a program in place to mitigate those risks. REMS are a growing sector of the market, and will only continue to grow in the future. Therefore, it is increasingly valuable to know the “ins and outs” of how to approach programming with REMS data, as it is very different from clinical trial programming. This paper explores the basic concepts of a REMS from a programming perspective: from a high level explanation of what a REMS is to the main differences programmers will see between clinical trials and REMS. We will discuss what types of data are typically included in REMS, what data issues to expect, general table structures and reported statistics, and how to effectively report data from an ongoing/changing database throughout the life of a REMS.

Software Demonstrations (Tutorials)

SD-121 : Integrating SAS and Microsoft Excel: Exploring the Many Options Available to You
Vince DelGobbo, SAS

This presentation explains different techniques available to you when importing and exporting SAS and Microsoft Excel data. You learn how to import Excel data into SAS using the IMPORT procedure, the SAS DATA step, SAS Enterprise Guide, and other methods. Exporting data and analytical results from SAS to Excel is performed using the EXPORT procedure, the SAS DATA step, SAS Enterprise Guide, the SAS Output Delivery System (ODS), and other tools. The material is appropriate for all skill levels, and the techniques work with various versions of SAS software running on the Windows, UNIX (including Linux), and z/OS operating systems. Some techniques require only Base SAS and others require the SAS/ACCESS Interface to PC Files.

SD-281 : A Single, Centralized, Biometrics Team Focused Collaboration System for Analysis Projects
Chris Hardwick, Zeroarc
Justin Slattery, Zeroarc
Hans Gutknecht, Zeroarc

Statisticians and programmers work in concert with each other to complete analysis projects and typically resort to using Excel and email to store metadata, track project milestones and collaborate with each other. See how using a single, centralized system to store TFL/Dataset metadata, communicate key analysis project milestones, report on QC efforts, conduct collaborative TFL reviews, and track TFL change requests can dramatically reduce data entry and reap big productivity benefits.

Statistics and Analytics

SA-013 : Diaries and Questionnaires: Challenges and Solutions
Marina Komaroff, Noven Pharmaceuticals
Sandeep Byreddy, Noven Pharmaceuticals, Inc.

In PharmaSUG2019 conference opening session, there was a question about the most unfavorable data set for programmers and statisticians to work with. Diaries and questionnaires (QS) was named among the first five! The rationale was the complexity of QS: too many items to work with, and hard to compare responses across the time points, within and between subjects. Clustering and/or categorization of diaries’ items using clinical judgement is known approach that helps with analyses. However, to compare the responses of the questions across multiple time points still requires deep understanding of the research question and strong programming skills. The goal of this paper is to convert diaries-haters to diaries-lovers and explain how appropriate algorithm should be developed and programmed. As example, the research question was to find a fraud in filling up the diaries and check out if subjects repeat the same responses (possibly randomly changing a couple of points) across different time points of the study. The authors suggest an algorithm and provide SAS® Macro to answer this research question; yet, this program can be easily adapted for other needs.

SA-021 : Agile Technology Integration Enables Analytic Development & Speed to Production
Bob Matsey, Teradata

Pharma companies are looking for Agile - Self Service solutions for their Analysts & Data Scientist to run their analytics quickly without any delays from IT, and with the ability to Succeed and/or Fail fast in a massively parallel database environment. They are looking for a seamless and open architecture for the business users that allows them to use whatever tools they are comfortable with, while being able to manage and to load all their various data types for discovery in an easy to use, flexible environment. They need to ability to quickly explore, prototype, and test new theories while allowing them to succeed or fail fast all in a self-serve environment that does not depend on IT all the time. The Business Analysts variety of data types needing to be loaded ranges from Hadoop, excel spreadsheets, SQL Server data, Oracle, SAS data sets, external data and other DBMS’s and non-structured data which all needs to be analyzed with traditional Data Warehouse data. This presentation outlines a simple path for building an environment that allows this to happen quickly and seamlessly. It provides a path to establishing an integrated Teradata and SAS in database Agile environment that is open to use other end user tools allowing you to join and analyze S3, Hadoop and other data types with your EDW data in a single, self-serve solution. It also discusses strategies of how to bring various data types in for analytics, which provides analysts with the ability to find those valuable nuggets of new data attributes & information that are important for the organization to grow and develop new business opportunities.

SA-034 : A Doctor's Dilemma: How Propensity Scores Can Help Control For Selection Bias in Medical Education
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine

An important strength of observational studies is the ability to estimate a key behavior or treatment’s effect on a specific health outcome. This is a crucial strength as most health outcomes research studies are unable to use experimental designs due to ethical and other constraints. Keeping this in mind, one drawback of observational studies (that experimental studies naturally control for) is that they lack the ability to randomize their participants into treatment groups. This can result in the unwanted inclusion of a selection bias. One way to adjust for a selection bias is through the utilization of a propensity score analysis. In this paper we explore an example of how to utilize these types of analyses. In order to demonstrate this technique, we will seek to explore whether clerkship order has an effect on NBME and USMLE exam scores for 3rd year military medical students. In order to conduct this analysis, a selection bias was identified and adjustment was sought through three common forms of propensity scoring: stratification, matching, and regression adjustment. Each form is separately conducted, reviewed, and assessed as to its effectiveness in improving the model. Data for this study was gathered between the years of 2014 and 2019 from students attending USUHS. This presentation is designed for any level of statistician, SAS® programmer, or data scientist/analyst with an interest in controlling for selection bias.

SA-051 : Calculation of Cochran–Mantel–Haenszel Statistics for Objective Response and Clinical Benefit Rates and the Effects of Stratification Factors
Girish Kankipati, Seattle Genetics Inc
Chia-Ling Ally Wu, Seattle Genetics

In oncology clinical trials, primary and secondary endpoints are analyzed using different statistical models based on the study design. Objective response rate (ORR) and clinical benefit rate (CBR) are commonly used as key endpoints in oncology studies, in addition to overall survival (OS) and progression-free survival (PFS). The use of ORR and CBR as an endpoint in these trials is widespread as objective response to therapy is usually an early indication of treatment activity and it can be assessed in smaller samples compared to OS; furthermore, FDA considers ORR and CBR as clinical and surrogate endpoints in traditional and accelerated approvals. Bringing new therapies to market based on ORR and CBR requires specialized statistical methodology that not only accurately analyzes these key endpoints but can also accommodate stratified study designs aimed at controlling for confounding factors. The Cochran-Mantel-Haenszel (CMH) test provides a solution to address these many needs. This paper introduces CMH test concepts, describes how to interpret its statistics, and shares insights into SAS® procedure settings to use it correctly. The calculation of ORR and CBR with 95% confidence intervals using the Clopper-Pearson method and relative risk and strata-adjusted p-values using the CMH test are discussed with sample data and example table shells, along with examples of how to use the FREQ procedure to calculate these values.

SA-056 : ADaM Implementation of Immunogenicity Analysis Data in Therapeutic Protein Drug Development
Jiannan Kang, Merck

This paper will focus on the design of immunogenicity analysis dataset in support of ADA-PK analysis in Therapeutic Protein Drug development. It will consist of an overview of multi-tier immunogenicity assessment and discussion on the comparison of two analysis dataset structures covering both collected sample assessments and derived subject assessments. The readiness for dataset creation, QC, analysis and reporting, and submission compliance will be highlighted.

SA-072 : Assigning agents to districts under multiple constraints using PROC CLP
Stephen Sloan, Accenture
Kevin Gillette, Accenture Federal Services

The Challenge: assigning outbound calling agents in a telemarketing campaign to geographic districts. The districts have a variable number of leads and each agent needs to be assigned entire districts with the total number of leads being as close as possible to a specified number for each of the agents (usually, but not always, an equal number). In addition, there are constraints concerning the distribution of assigned districts across time zones, in order to maximize productivity and availability. Our Solution: uses the SAS/OR ® procedure PROC CLP to formulate the challenge as a constraint satisfaction problem (CSP), since the objective is not necessarily to minimize a cost function, but rather to find a feasible solution to the constraint set. The input consists of the number of agents, the number of districts, the number of leads in each district, the desired number of leads per agent, the amount by which the actual number of leads can differ from the desired number, and the time zone for each district.

SA-103 : Risk-based and Exposure-based Adjusted Safety Incidence Rates
Qiuhong Jia, Seattle Genetics
Fang-Ting Kuo, Seattle Genetics
Chia-Ling Ally Wu, Seattle Genetics
Ping Xu, Seattle Genetics

In clinical trials, safety event incidences are summarized to help analyze the safety profile of investigational drugs. The most common and straightforward method is the crude rate, which is the total number of subjects with at least one event of interest within a given population. However, if the average duration of exposure differs significantly between treatment groups within a trial or between trials included in an analysis due to differential drop-out rates or study design, such incidence rates may need statistical adjustment to make the comparison meaningful. The analysis of exposure-adjusted incidence rates is often found useful in such cases. This paper introduces a simplified exposure-adjusted rate where sum of treatment duration of a population is used as the denominator, as well as a time-at-risk adjusted rate where sum of person-time at risk for each event of interest is used as the denominator. Person-time at risk for each subject is usually defined as the time from treatment start to the first onset of an event or to the end of follow-up if the event does not occur. Step-by-step SAS® code to derive these adjusted rates is examined using hypothetical adverse event data, and the statistical implications are discussed in detail with comparison to the crude incidence rates. Further examples of exposure-adjusted analysis considerations such as presenting results (e.g., exposure-adjusted rate differences) in forest plots with confidence intervals will also be demonstrated.

SA-112 : Should I Wear Pants? And Where Should I Travel in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures
Troy Hughes, Datmesis Analytics
Louise Hadden, Abt Associates Inc.

Decision tables operationalize one or more contingencies and the respective actions that should be taken when contingencies are true. Decision tables capture conditional logic in dynamic control tables rather than hardcoded programs, facilitating maintenance and modification of the business rules and decision rules they contain—without the necessity to modify the underlying code (that interprets and operationalizes the decision tables). This text introduces a flexible, data-driven SAS® macro that ingests decision tables—maintained as comma-separated values (CSV) files—into SAS to dynamically write conditional logic statements that can subsequently be applied to SAS data sets. This metaprogramming technique relies on SAS temporary arrays that can accommodate limitless contingency groups and contingencies of any content. To illustrate the extreme adaptability and reusability of the software solution, several decision tables are demonstrated, including those that separately answer the questions Should I wear pants and Where should I travel in the Portuguese expanse? The DECISION_TABLE SAS macro is included and is adapted from the author’s text: SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality.

SA-147 : Programmatic method for calculating Main Effect Interaction p-value in SAS 9.4 using GLMMOD to create Indicator Variables and associated Design Matrix.
Christopher Brown, Frontier Science Scotland
Christine Campbell, Frontier Science Scotland

SAS 9.4 contains the PHREG function which is used in analysis of survival data [REF1]. It can be used to intrinsically generate a p-value for the interactions between treatment and main effects where all variables have no more than 2 levels [REF2]. To generate the interaction p-value where variables can have more than 2 levels, one should avoid the SAS implementation of the Type3 test [REF3] and avoid changing categorical variables into numerical ones in order to utilise the intrinsic features of PHREG [REF4]. Instead one should generate indicator variables and then select the appropriate products from the design matrix to test in the Cox model. The desired result can be achieved by using GLMMOD to create the indicator variables and associated design matrix and then programmatically select the relevant indicator variables which are those associated with the individual main effects and their interactions. The design matrix and the selected indicator variables are then fed into PHREG and a ‘Linear Hypothesis Test’ is performed which yields the interaction p-value we require [REF5]. In this paper we shall demonstrate a complete implementation of a programmatic solution utilising a custom macro which analyses a standard Time-to-Event ADaM dataset and generates the p-value of the main effect and interactions between two specified variables. This is most commonly performed on ADTTE dataset, or a variant thereof.

SA-149 : Sample size and HR confidence interval estimation through simulation of censored survival data controlling for censoring rate
Giulia Tonini, Menarini Ricerche
Letizia Nidiaci, Menarini Ricerche
Simona Scartoni, Menarini Ricerche

Time to event variables are commonly used in clinical trials where survival data are collected. When planning a trial and calculating the sample size, it is important to estimate not only the expected sample size but also the minimum value of the Hazard Ratio for which the trial can be considered successful (i.e. accepting the alternative hypothesis). Calculating the confidence interval for the HR when the difference of treatments is statistically significant, can be performed simulating censored survival data. Simulating censoring can be tricky, especially when the trial has not a fixed duration or when a high drop-out rate is expected. This is often the case in Oncology trials where patients exit the study prematurely (e.g. in case of AE or disease progression),. In this work we present a case study where a dataset is simulated controlling for the resulting censoring rate. In particular we simulate data from an oncology trial comparing two treatments. We give an example of how to realize that simulation in SAS. We compare results coming from different assumptions regarding drop-out rate and methods to model censoring. We investigate the effect of the censoring rate on potential bias in estimating the treatment effect using a Cox model.

SA-207 : A Brief Introduction to Performing Statistical Analysis in SAS, R & Python
Erica Goodrich, Brigham and Women's Hospital
Daniel Sturgeon, Priority Health

Statisticians and data scientists may utilize a variety of programs available to them to solve specific analytical questions at hand. Popular programs include commercial products like SAS and open source products including R and Python. Reasons why a user may want to use differing programs will be discussed. This presentation aims to present a brief primer into the coding and output provided within these programs to preform data exploration and commonly used statistical models in a healthcare or clinical space.

SA-225 : Principal Stratum strategy for handle intercurrent events: a causal estimand to avoid biased estimates
Andrea Nizzardo, Menarini Ricerche
Giovanni Marino Merlo, Menarini Ricerche
Simona Scartoni, Menarini Ricerche

The ICH E9 (R1) addendum “Estimands and Sensitivity Analysis in Clinical Trial” emphasizes the necessity to quantify better treatment effects addressing the occurrence of intercurrent events that could lead to ambiguity in the estimates. One approach proposed in the addendum is the “Principal Stratum strategy” where the target population for the analysis is a sub-population composed of patients free from intercurrent events. The main concern is that it is not possible to identify the stratum in advance and the analysis is then not causal and liable to confounding. Moreover, the occurrence of intercurrent events is not predictable, and each subject is observed on one treatment only and could experience different intercurrent events on different treatments. Also the FDA Missing Data Working Group recommends the use of a causal estimand for the evaluation of primary interest endpoints. A causal estimand based on principal stratum and a relative tipping point sensitive analysis is then proposed for a clinical superiority study. A case study is showed. An evaluation by simulations of the power obtained with this approach is also presented comparing the ideal situation when all patients are free from intercurrent events and scenarios where different percentages of patients with intercurrent events are supposed.

SA-234 : Bayesian Based Dose Escalation Clinical Trial Designs
Krishnakumar K P, Cognub Decision Solutions Pvt. Ltd.
Aswathy S, Cognub Decision Solutions Pvt. Ltd.

In phase I clinical trials, investigational drugs are administered in healthy volunteers to determine the recommended dose and toxicity profile of an investigational drug. In addition to evaluating the safety and tolerable dosage, investigators also look at parameters such as pharmacokinetic or pharmacodynamic outcomes. In Phase-I trials, the dose-escalation methods play an important role to establish MTD by using the minimum possible number of patients so as to avoid exposing too many patients to therapeutic doses. Adaptive features of Phase I dose-finding studies are of great interest in pharmaceutical and medical research. Implementing model-based designs will generate more accurate dose recommendations for later-stage testing and thereby increases the efficiency and likelihood of successful drug development. In this paper, we compare commonly used and much simpler 3+3 design to more sophisticated model-based designs such as modified toxicity probability interval method or the modified continual reassessment model. The application of Bayesian adaptive methods is more prevalent in Phase I and II trials due to its flexibility and effectiveness. The paper elucidates Bayesian-based Dose-Escalation Clinical Trial Designs and its comparison to other traditional approaches applied in this area.

SA-243 : Imputation for Missing Dosing Time in NONMEM PopPK Datasets
Shuqi Zhao, Merck & Co., Inc.

For late stage clinical trials with oral daily dosing regimen, complete dosing history data is generally not available. Following study design, patients get PK samples drawn at clinic sites, but between scheduled visits, patients are expected to take daily doses at home. Both date and time for PK samples are collected, while accurate time as to when the dose is taken for most of the doses is not available. When building NONMEM-ready PopPK datasets, however, both date and time for dosing records are required for calculation of actual relative time from first dose and actual relative time from last dose. Due to incomplete dosing history data, time imputation comes into play. Imputation methods can be different depending on study design and data collection design, but the general rule is that the relative time of dose to PK sample doesn’t get changed. This paper provides a step-by-step programming guide on how to impute time for dosing records when actual time information is only available for dose prior to PK sample. Topics such as how to unfold dosing records in EX domain into individual daily records and commonly seen data issues in relevant datasets will also be discussed.

SA-262 : Using SAS Simulations to determine appropriate Block Size for Subject Randomization Lists
Kevin Venner, Almac Clinical Technologies
Jennifer Ross, Almac Clinical Technologies
Kyle Huber, Almac Clinical Technologies
Noelle Sassany, Almac Clinical Technologies
Graham Nicholls, Almac Clinical Technologies

ICH E9 (Regulatory) guideline specifies that a minimum block size should not be used for the generation of Randomization lists to avoid predictability/selection bias and to avoid full or partial unblinding to treatment assignment. Additionally, knowledge of the block size used within the Randomization list should be restricted (typically known by Sponsor Biostatistician only). However, in reality Sponsor Biostatisticians repeatedly utilize the same (default) block size for list generation (e.g. always use a block size of 4 for x2 treatments with equal allocation ratio) and/or utilize the minimum block size. Thus, Sponsor (and Site) personnel are not really ‘blinded’ to the utilized block size, potentially compromising the entire study’s validity. This paper will illustrate how SAS is an effective simulation tool to provide evidence that different block size designs/parameters can yield acceptable balance without having to use the minimum or most obvious/default block size. Case study examples will demonstrate how simulations can evaluate expected treatment balance for alternative block size designs. Through use of SAS macro programming, different randomization block size designs can be efficiently simulated though minor macro updates, allowing for quick delivery of statistically robust results. While Treatment balance for a clinical trial can be critical for establishing effectiveness, perfect Treatment balance is not required. The goal of randomization is to maintain the integrity of the study (keeping study blind, avoiding bias) and to achieve acceptable balance. Simulation results can provide data-driven results to show that acceptable balance can be achieved with alternative block size designs.

SA-284 : Implementing Quality Tolerance Limits at a Large Pharmaceutical Company
Steven Gilbert, Pfizer

Predefined quality tolerance limits (QTLs) were introduced in the revised ICH E6 (R2) Section 5 update to help identify systematic issues that can impact subject safety or reliability of trial results. This paper will focus on Pfizer’s implementation of this requirement. The key focus will concern the approach with respect to loss of evaluable subjects, patient discontinuation and inclusion/exclusion errors, that is easily measurable attribute data. We discuss a team approach in setting tolerance limit as well as lessons learned in monitoring the progress and the important role of simulations in defining best practices for monitoring trials. Examples of simulation methods, signal detection through control charts suitable for short-run attribute data such as variable life adjusted displays and other graphical methods will be demonstrated along with example code. We reflect on preferred methodology and challenges inherit in various clinical trial designs ending with a look at future work needed to maximize the use of QTLs in mitigating trial risk and ensuring the integrity of published results.

SA-370 : A Visual Guide to Selecting Appropriate Matching Solutions in Epidemiology
Vidhya Parameswaran-Iyer, Independent

Observational studies provide meaningful clinical insights to evaluate the long-term performance of treatments in broadly inclusive real-world settings. Given the non-randomized nature of treatment assignment, it is essential to minimize bias by controlling for confounding pre-treatment variables to establish valid causal effect relationships. In studies comparing two or more groups of highly dissimilar individuals, searching for an appropriate matching solution becomes a labor-intense and highly iterative process. Visualizing the bias-variance trade-off by plotting imbalance metrics against the number of matched individuals can aid researchers in making evidence-based analytic plan recommendations (King, 2011). In this paper, I discuss my experience with matching two groups of hemodialysis patients who were prescribed competing phosphate-lowering therapies using propensity score matching (PSM) and coarsened exact matching (CEM) techniques. The goals of this discussion are tri-fold: (1) present visual comparisons of the effectiveness of PSM (nearest neighbor and inverse probability of treatment weighting methods) and CEM techniques; (2) provide automating solutions using SAS® macros to carry out matching on pre-specified covariates; (3) help researchers identify the most appropriate matching solution to minimize imbalances between treatment groups.

Strategic Implementation

SI-019 : Affordably Accessing Bulk Cloud Storage for Your Analytics
Tho Nguyen, Teradata
Ken Pikulik

How do you affordably manage data at scale while still being able to use it for analytics? Hadoop made big promises and delivered an inexpensive model for storing data. Unfortunately, it never fully delivered an effective way to transform that data for use by data scientists and citizen data scientists. Today, companies turning to the cloud are headed down a similar path by investing in inexpensive bulk storage cloud-based systems such as Amazon S3 or Azure Blob. Fortunately, history doesn’t need to be repeated as SAS and Teradata can help simplify access and utilization of that data for analytics regardless of the analytic tool or where they are located. This presentation will explain how successful companies are leveraging bulk cloud storage systems to minimize data costs while still enabling users to access the data they need to perform sophisticated, advanced analytics. It will explore how companies can architect a robust analytical platform by utilizing data lakes, accessing and structuring the data through Vantage, whilst minimizing data movement in-and-out of the cloud and increasing the overall ROI of analytics.

SI-060 : Doing Less Work for Better Results - Process Optimization in DTLG Production
Wayne Zhong, Accretion Softworks

For programmers working in the pharmaceutical industry, creating Datasets, Tables, Listings, and Graphs (DTLG) is a common responsibility. The steps one is expected to follow to create these DTLGs are described in each company's process and can differ greatly between companies. The reason is that the process at each company can evolve out of different strategies and philosophies, resulting in novel workflows and macros, standards and automation. However, is there a DTLF production process that is best? How should a comparison between processes be conducted? This paper identifies common tasks encountered in producing DTLGs, describes contrasting processes adopted by the industry to perform those tasks, and discusses the strengths and weaknesses to each approach. Some metrics of comparison include the cost to implement and maintain a certain process, the effect on the final quality of the output, and the effect on the amount of labor needed to accomplish a task. These considerations may prove beneficial to determining the business needs of a company and identifying what type of process is best to meet it.

SI-170 : Moving A Hybrid Organization Towards CDISC Standardization
Kobie O'Brian, SCHARP, Fred Hutch
Sara Shoemaker, Fred Hutch / SCHARP
Robert Kleemann, Fred Hutch / SCHARP
Kate Ostbye, SCHARP, Fred Hutch

This paper discusses the experience of implementing standardization of data collection and data set development of submission-ready data sets at a unique organization at the intersection of Academia and Industry. SCHARP (Statistical Center for HIV/AIDS Research and Prevention) at Fred Hutchinson is an academic center with a nonprofit business model. It is in a unique position requiring a balance of standard regulatory reporting requirements as well as specific sponsor needs with stakeholders including the NIH, academic centers, and nonprofit foundations. This requires a tailored approach for standardization both for collecting, submitting, and analyzing data across the organization. The movement towards CDISC standardization for data collection, submission, and analysis has been encouraged for organizations that conduct clinical trials. In particular, the FDA now requires all tabulation data submitted for drug approval to be in Study Data Tabulation Model (SDTM) format. SCHARP, as a world-class statistical and data management center, has both created a data collection global library (GLIB) aligned with Clinical Data Acquisition Standards Harmonization (CDASH), as well as incorporated important unique case report forms to meet individual sponsor requirements. To meet these sometimes-conflicting needs, SCHARP has adopted a variation of SDTM known as SDTM +/-. This prepares SCHARP for submission to regulatory authorities, for ease of reporting, for the development of Analysis Data Model (ADaM), as well as for future data sharing best practices. SCHARP had successful completed this by using an internally programmed data set development tool called Delphi to transform electronically collected data into SDTM +/- data sets.

SI-173 : PROC Future Proof;
Amy Gillespie, Merck & Co., Inc.
Susan Kramlik, Merck & Co., Inc.
Suhas Sanjee, Merck & Co., Inc.

Clinical trial programmers are key contributors to regulatory submissions, manuscripts, and statistical analyses. They operationalize analysis plans by creating high-quality, innovative and compliant reporting deliverables to address stakeholders’ needs. Clinical trial programmers are experts in authoring programming code and developing or leveraging programming standards to produce deliverables in a validated, efficient, and reproducible manner. However, the job role and work process for clinical trial programmers have stayed relatively constant in the past decades. This paper evaluates recent advances in technology and the skillsets of clinical trial programmers to identify opportunities for improved compliance and work efficiency while ultimately optimizing the programming function and potentially transforming the clinical trial programming role for continued success. Use cases leveraging natural language processing (NLP) and linked data will be explored to evaluate whether digital solutions are applicable within clinical trial programming processes. The use of different software tools and methods will also be evaluated. We expect this paper to be the first of a series of publications on this topic.

SI-205 : Redefine The Role of Analysis Programming Lead in Modern Clinical Trials
Homer Wang, Covance

Analysis programming leads (APL) today are facing more and more challenges due to the evolving standards of production and submission requirements. The lack of clear APL career orientation, training, and definition result in under qualified APLs and high rate of turnover from APL position. This article will discuss the definitions of traditional and modern APL, the key requirements for a successful APL, and a proposal of modifying the APL career orientation to meet the challenges of the modern clinical trial analysis including but not limited to: 1) create a dedicated role and career development path for APL under the line managers; 2) create a APL training package to have the APLs better equipped for the functions; and 3) better manage the APL from hiring to resourcing so that the APL can efficiently focus on the lead functions to achieve the desired quality of the products.

SI-206 : Assessing Performance of Risk-based Testing
Amber Randall, SCHARP - Fred Hutch Cancer Research Center
Bill Coar, Axio Research

Trends in the regulatory landscape point to risk-based approaches to ensure high quality data and reporting for clinical trials. Risk-based methods for validation of production programming code which assign testing methods of varying robustness based on an assessment of risk have been evaluated and accepted by some industry leaders, yet they have not been fully adopted. Some view risk-based testing as simply an attempt to save money or compensate for limited resources while claiming a minimal impact on overall quality. While that may sometimes be the case, the intent should rather be to focus finite resources on what matters most. Even with the robust gold standard of full independent reproduction, mistakes still happen. Humans make errors. Therefore, risk and consequence should be considered in choosing verification methods with a resource emphasis on those areas with greatest impact. However, the assessment of these decisions must be regularly and consistently evaluated to ensure that they are appropriate and effective. Metrics both within and across projects can be implemented to aid in this evaluation. They can report the incidence, type, and method of identification of issues found at various timepoints including internally prior to the completion of output verification, internally during final package review, and during external review. These data are crucial for the effective evaluation of the performance of risk-based testing methods and decisions.

SI-230 : Be proud of your Legacy - Optimized creation of Legacy datasets
Yevgeniy Telestakov, Quartesian LLC
Viktoriia Telestakova, Quartesian LLC
Andrii Klekov, Quartesian

In this paper is described my own experience in the generation of the Legacy Datasets, especially the process of optimization for a huge number of Studies that were held until 17th December 2016, with further including in the one big ISS study. The main purpose of this Scope of Work was to include a number of prepared Legacy Studies in the subsequent ISS Submission along with other standard stages. Here are also revealed main stages of the workflow process, what exactly needed to find out from the document STUDY DATA TECHNICAL CONFORMANCE GUIDE and on what needs to pay attention while working to don’t miss something important what is written between the lines. Our personal approaches and techniques in optimizing the standardization process for a large number of studies, with the goal to decrease the number of hours of resources involved in.

SI-241 : You down to QC? Yeah, You know me!
Vaughn Eason, Catalyst Clinical Research, LLC
Jake Gallagher, Catalyst Clinical Research, LLC

The successful delivery of clinical trial-related analysis datasets and outputs are heavily dependent on an efficient and fluid relationship between the Production Programmer, QC Programmer, and Lead Statistician. Given the increasing complexity and rapidity of project delivery, coupled with multiple regulatory standards that must be adhered to; a sound strategy and clear model of communication are paramount to a study’s overall quality. Understanding the pressure and stress associated with project deadlines and nuanced sponsor requirements will help navigate communication. Further understanding of each role’s subjective nature will help outline a succinct archetype to achieving high-quality results with minimal headaches. The three main barriers of communication between Production Programmer, QC Programmer, and Lead Statistician are vague understanding of client expectations, overall study comprehension and the rarely considered, Ego factor. Tackling these issues can seem unattainable, however, we have constructed a roadmap to alleviate the friction that may occur during the delivery and quality assurance process. Some of these points include thorough ongoing communication with study leadership, purposeful and engaging meetings to eliminate opacity of overall objectives between team members, and finally, an effective way to universally communicate between any type of team member you may encounter.

SI-250 : Single Database for Pharmacometric and Clinical Data
Patricia Guldin, Merck & Co., Inc., Upper Gwynedd, PA USA
Jing Su, Merck & Co., Inc, Upper Gwynedd, PA USA

Pharmacometric analysis has become more prominent in the drug development process. Across all stages of clinical development, pharmacometric analyses are used to answer key questions including target selection, go'no-go, regimen selection, dose selection and optimization for safety and efficacy, and intrinsic/extrinsic factors. Preparing and submitting quality data in a timely and compliant manner to support these analyses and meet regulatory requirements has become essential. Often pharmacometric and clinical data are stored in separate databases, causing many challenges for source data cleaning, analysis data creation, and meeting regulatory submission requirements. This paper will describe the end-to-end data flow and the benefits and challenges of using a single database for both pharmacometric and clinical data.

SI-374 : Automating Clinical Process with Metadata
Paul Slagle, Clinical Solutions Group
Eric Larson, Clinical Solutions Group

The value of a systematic approach to using metadata has been presented and hyped for many years with a number of vendors creating tools, metadata repositories, to support the use of metadata. Companies have bought into this by purchasing these tools and investing months into their creation and management with the end result being nothing more than having an automated tool for creating excel spreadsheets that teams can use for documenting specifications. So it's not a big surprise that the concept of metadata management is less than positively received. The use of graph databases along with metadata repositories has created a better opportunity for building automation into the clinical process. Last year, Paul and Eric presented how such a tool could be used for managing the creation of data collection tools for clinical trials. This takes the concept to the next level by explaining, and showing, how the metadata in a graph database can be used for study development through ADaM creation. By leveraging metadata, the traceability within the clinical trial can be improved to support an expedited submissions process. This paper will walk through the concepts and the stages from clinical trial creation, SDTM creation, and ADaM dataset creation.

Submission Standards

SS-030 : End-to-end Prostate-Specific Antigen (PSA) Analysis in Clinical Trials: From Mock-ups to ADPSA
Joy Zeng, Pfizer
Varaprasad Ilapogu, Ephicacy Consultancy Group
Xinping Cindy Wu, Pfizer

Prostate-specific antigen (PSA) level is a key biomarker in prostate cancer that has been used in standard guidelines as a measurement of clinical outcomes for patients with prostate cancer. This paper aims to provide an end-to-end overview of the programming aspects of PSA-related trials. We describe the concepts of PSA response and time to PSA progression, two important end points in assessing efficacy of prostate cancer trials, along with the statistical methods involved in estimating the distribution of time to PSA progression. The paper also addresses the design of metadata from PSA-related mock-up tables and presents the considerations involved in the creation of CDISC-compliant ADPSA dataset based on the metadata. Programming in the oncology therapeutic area is highly specialized and we hope this paper serves as a one-stop shop for providing the necessary tools to navigate through it.

SS-045 : Updates in SDTM IG V3.3: What Belongs Where – Practical Examples
Lucas Du, Vertex Pharmaceuticals INC
William Paget, Vertex Pharmaceuticals INC
Lingyun Chen, Vertex Pharmaceuticals INC
Todd Case, Vertex Pharmaceuticals Inc

CDSIC SDTM Implementation Guide (IG) Version 3.3 was released on 11/20/2018. New domains and implementation rules have been added to standardize SDTM implementation within the industry. Comparing to version 3.2, a lot of information was updated during the 5 years between releases. It also brings a great challenge for people working in Pharma/Biotech to figure out all the details. For example, what are the new domains and how should we use the new domains. furthermore, Same information may map to different domains due to the purpose, how should we decide which domain the information should go to. Also, In the Trial Summary (TS), Comments (CO), Trial Inclusion/Exclusion Criteria (TI) or other general observational class, the variables with any context with more than 200 characters will be mapped to the corresponding SUPP domain. But the label for variables in SUPP domain varies within the industry. In addition, it’s not clear how to populate the EPOCH variables in event, finding and interventions domains or how to deal with subjects in DM domain who are randomized but never dosed. In this paper, updates will be highlighted, and examples will be provided.

SS-054 : RTOR: Our Side of the Story
Shefalica Chand, Seattle Genetics, Inc.
Eric Song, Seattle Genetics, Inc.

FDA’s Real-Time Oncology Review (RTOR) pilot program was initially introduced in June 2018 for supplemental New Drug Applications (NDA) and supplemental Biologics License Applications (BLA) and more recently extended to original NDAs and BLAs. This presents a new ray of hope for cancer patients, as the program aims to expedite review of oncological submissions with improved efficiency and quality by allowing FDA earlier access to clinical safety and efficacy data and results, especially those related to Biometrics. This in turn may help expedite availability of novel treatments to cancer patients. Seattle Genetics participated in the RTOR pilot program for a supplemental BLA for ADCETRIS ECHELON-2 in CD30-expressing PTCL, which received approval in an unprecedented 11 days from sBLA submission. Against the backdrop of our positive RTOR experience, this paper will provide a background of the program, its eligibility criteria, and its success so far. We will give you insights into: • How and why a submission can be accepted into this program • FDA’s RTOR expectations and how they evolved since our sBLA to now • Effective communication and collaboration within our organization and with FDA • Seamless preparation and planning to enable rapid submission and review • Post-submission activities and efficient handling of regulatory questions • The pivotal role of Statistical Programming and best practices towards perpetual submission-readiness We are excited to share our story as well as insights into more recent RTOR developments to help colleagues in industry be optimally prepared to get drugs to cancer patients faster!

SS-081 : Why Are There So Many ADaM Documents, and How Do I Know Which to Use?
Sandra Minjoe, PRA Health Sciences

As of this writing, the CDISC website has the following ADaM documents for download: a model document, three versions of the implementation guide, an adverse event data structure, an occurrence data structure, a time-to-event document, a document with examples in commonly used statistical analysis methods, an analysis result metadata document, conformance rules, and an important considerations document. Additionally, you can download three release packages, each containing a subset of these documents. This paper describes why there are so many documents, walks through basic information contained in each, and makes recommendations of which set of documents to use in which circumstances.

SS-094 : Implementation of Metadata Repository & Protocol Automation at Bayer Pharma
Girish Rajeev, Bayer Pharmaceuticals

Our company's standards department has recently successfully implemented a Metadata Repository (MDR) for global use across Bayer Pharma for defining, governing, managing and consuming structured medical standards. The standards are used for several upstream and downstream processes and the MDR is used to send machine readable standards to other systems both hosted internally at Bayer or externally by vendors such as Electronic Data Capture and Clinical Data Repository. Also in parallel, Bayer deployed a system form structured Protocol Authoring Tool (PAT). The PAT keeps track of constraints and connections between study design elements such as Objectives, Endpoints/Variables, Inclusion/Exclusion Criteria, Schedule of Activities and Trial Parameters. With native integration to the MDR, Company and industry approved clinical trial standards can be incorporated early on in the study design process. Users can plan and author Clinical Trial Protocol documents on demand using Bayer specific or industry standard templates such as TransCelerate’s Common Protocol Template (CPT) and the NIH-FDA Clinical Trial Protocol Template. This presentation will discuss our vision for MDR and PAT, implementation of these systems, the process change that was needed and the efficiencies that were gained.

SS-095 : Masters of the (Data Definition) Universe: Creating Data Specs Consistently and Efficiently Across All Studies
Michael Hagendoorn, Seattle Genetics, Inc.
Ellen Lin, Seattle Genetics, Inc.

Creating specifications for SDTM and ADaM data sets is not for the faint of heart. The brave author must reference many study documents like analysis plans and table shells, internal and external data standards, myriad regulatory guidance from different agencies, and a bewildering plethora of suggestions for spec templates and tools spilling onto the screen in a quick internet search. Over the course of many long weeks, all these are arduously curated into a comprehensive specification for the study at hand, so the study programming team can charge ahead! Meanwhile, two offices down, another author is going through the same painful process for their own study… and so on. Compared to capturing specifications separately for each study in this manner, a more optimal approach is to establish a single master data specification at the compound level that provides all metadata for SDTM, ADaM, and controlled terminology for all studies and integrations on the product. This setup realizes many advantages, including faster delivery of specs in higher quality across all studies with instant visibility into variations in mapping and derivations; easier safety and efficacy integrations (ISS/ISE); enhanced adoption of standards; improving compliance and facilitating regulatory review; huge efficiency gains through automatic application of consistent metadata in SAS programs; and potential for powerful downstream metadata-driven output generation. We will present a simple yet elegant and comprehensive design anyone can immediately leverage to unleash the power of compound-level master data specifications!

SS-097 : Data Review: What’s Not Included in Pinnacle 21?
Jinit Mistry, Seattle Genetics
Lyma Faroz, Seattle Genetics
Hao Meng, Seattle Genetics Inc.

Many pharmaceutical and biotechnology companies outsource statistical programming activities and submission package preparation to CROs. Still, for all programming deliverables the sponsor remains responsible for quality, completeness, and compliance to published standards and regulatory guidance. This makes it critical for the sponsor to implement efficient vendor oversight that touches on sufficient detail to ensure quality of the product provided by the CRO. Sponsors are widely using the Pinnacle 21 toolset to ensure SDTM and ADaM compliance with CDISC guidance. However, by itself this is not sufficient, and additional review of how the CRO adopted CDISC implementation guides to assign or derive variables in alignment with study design, protocol, and SAP need to be conducted beyond Pinnacle 21 reports. For example, Pinnacle 21 checks whether variable values are present and runs several logic and interdependency checks, but it doesn’t validate the correctness and accuracy of such values in relation to study documents and other specifications or constraints. This paper will share various CDISC data validation checks that can be performed outside of Pinnacle 21 to significantly heighten the quality of any submission and help mitigate review questions and technical rejections.

SS-134 : Essential steps to avoid common mistakes when providing BIMO Data Package
Chintan Pandya, Merck & Co.
Majdoub Haloui, Merck & Co. Inc.

CDER’s Bioresearch Monitoring (BIMO) team has responsibility for verifying the integrity of clinical data submitted in regulatory applications and supplements and for determining compliance of trial conduct in accordance to FDA regulations and statutory requirements. In the FDA Draft Guidance for Industry, CDER’s BIMO inspectors and Office of Regulatory Affairs (ORA) identifies sites of interest from all major pivotal studies within the submission. This paper will provide essential steps to avoid common mistakes in the BIMO package that could trigger FDA IR or delay approvals. In addition, this paper will provide a general approach and tips to cross-check safety and efficacy counts against actual CSR counts.

SS-140 : Pinnacle 21 Community v3.0 - A Users Perspective
Ajay Gupta, PPD Inc

Pinnacle 21, also previously known as OpenCDISC Validator, provides great compliance checks against CDISC outputs like SDTM, ADaM, SEND and Define.xml. This validation tool provides a report in Excel or CSV format which contains information categorized as errors, warnings, and notices. In May2019, Pinnacle 21 team had released Community v3.0. This paper will provide an overview of all major updates in Pinnacle 21 community v3.0 e.g. new validations checks, ADaM IG v1.1 support, latest Controlled Terminology support. Later, this paper will cover the SNOMED, MedDRA dictionaries installation process which is no longer supported by Pinnacle 21.

SS-150 : Challenges and solutions for e-data submission to PMDA even after submission to FDA
Akari Kamitani, Shionogi
HyeonJeong An, Shionogi Inc.
Yura Suzuki, Shionogi & Co., Ltd.
Malla Reddy Boda, Shionogi Inc.
Yoshitake Kitanishi, Shionogi & Co., Ltd.

The data submission at NDA to FDA has already been mandatory. That of application to PMDA has been also mandatory after April 2020. Required data to be submitted are CDISC compliant. As a result, we tend to be considered that submission to PMDA is easy after we submit e-data to FDA. However, this is not true in fact. Our company (SHIONOGI) has headquarters in Japan, and group companies are in US and Europe and promote the drug development globally. Therefore, it is assumed that we will submit to PMDA after we submit to FDA. In this paper, we verify differences by taking the specific submission as an example under such situation. Specifically, there were differences in validation rules, target dates, document for consultation, and so on. Of course, utilizing this verification, we aim to apply for data efficiently, regardless of whether the application is earlier to FDA or PMDA. For this reason, during developing deliverables for each clinical trial, it is necessary to get closer to preparing one package that meets the rules of both authorities. Intended Audience: Anyone in the industry who is interested in data package preparation for PMDA and FDA

SS-151 : Supplementary Steps to Create a More Precise ADaM define.xml in Pinnacle 21 Enterprise
Majdoub Haloui, Merck & Co. Inc.
Hong Qi, Merck & Co Inc.

The Analysis data definition document, ADaM define.xml, is a required document in regulatory submission package. It provides necessary information to describe the submitted ADaM datasets and their variables. A high quality define.xml is important for a smooth review process. Pinnacle 21 Enterprise enables the automation and standardization to generate a high quality define.xml. However, due to certain limitations of the current version of Pinnacle 21 Enterprise software, extra steps are needed to create a more precise define.xml after the first import of ADaM datasets specification. These steps will generate an ADaM define.xml with better description of the attributes, controlled terms, and the source for certain variables. In this paper, the authors will introduce detailed steps leading to a more accurate ADaM define.xml file.

SS-156 : Analysis Package e-Submission – Planning and Execution
Abhilash Chimbirithy, Merck & Co.
Saigovind Chenna, Merck & Co., Inc
Majdoub Haloui, Merck & Co. Inc.

Analysis package is one of the e-Submission components submitted to regulatory agencies as part of INDs, NDAs, ANDAs, BLAs and sBLAs. Analysis package contains the study analysis data and related files following a standardized electronic format. Proper planning and resources are required to create this package which has several individual components such as datasets in XPT format, define, analysis results metadata (ARM), data reviewers guide, and programs. These components are separate deliverables but interrelated. With frequently identified challenges, questions and issues from previous studies, we have provided guidance related to planning, setting deliverables timelines, identifying team members responsibilities and ensuring regulatory required data standards compliance for the components submitted. In this paper, we will present details for best practices, proper planning and checklists that can help teams efficiently create an analysis package. It also highlights ways to achieve effective cross-functional collaboration and consistently meet regulatory compliance.

SS-159 : Automating CRF Annotations using Python
Hema Muthukumar, Statistical Center For HIV/AIDS Research and Prevention(SCHARP) at Fred Hutch
Kobie O'Brian, SCHARP, Fred Hutch

When data are submitted to the FDA, an Annotated Case Report Form (aCRF) is to be provided as a PDF document, to help reviewers find the origin of data included in submitted datasets. These annotations should be simple, clean, and should respect appearance and format (color, font) recommendations. aCRFs are traditionally done manually. This involves using a text editor in PDF and working variable by variable across many pages. This is a time-consuming process that can take many hours. In addition, maintaining consistency across pages requires substantial effort. This paper talks about an effective way to automate the entire aCRF process using Python. This approach automatically annotates the variables on the CRF next to their related questions on the appropriate pages. In this method, we use the following: a Study Design Specification which is an excel sheet of the study details as built by an Electronic Data Capture (EDC) system; an SDTM mapping specification, which is also an excel sheet; and the study case report form in PDF format. The output for this method is an FDF file, which is used to automatically create the final aCRF. This method significantly reduces the time and effort required to create aCRF while eliminating inconsistent annotations. This method is very useful since it is flexible and can be implemented to annotate CRFs for different types of trials and organizations.

SS-160 : Submission Standards: A SAS Macro approach for submission of software programs(*.sas)
Sarada Golla, PPD

Study Data Technical Conformance Guide provides specifications, recommendations, and general considerations on how to submit standardized study data using FDA-supported data standards located in the FDA Data Standards Catalog. The guide provides information for sponsors to submit data in electronic format. eCTD Technical Conformance Guide provides specifications, recommendations, and general considerations on how to submit electronic Common Technical Document (eCTD)-based electronic submissions to the Center for Drug Evaluation and Research (CDER) or the Center for Biologics Evaluation and Research (CBER). Sponsors are to provide the software programs used to create all ADaM datasets and generate tables and figures associated with primary and secondary efficacy analyses. The main purpose of requesting the submission of these programs is to understand the process by which the variables were created for the respective analyses and to confirm the analysis algorithms, derivations, and results. Sponsors are asked to submit software programs in ASCII text format and not just send the executable files along with the submission. This paper focuses on the application of a SAS macro developed specifically for submission of software programs as part of submission to eCTD Module 5 ensuring eCTD compliant naming conventions and file formats.

SS-197 : Preparing a Successful BIMO Data Package
Elizabeth Li, PharmaStat, LLC
Carl Chesbrough, PharmaStat, LLC
Inka Leprince, PharmaStat, LLC

In order to shorten the time for regulatory review of a new drug application (NDA) or biologic license application (BLA), more and more biotech and pharmaceutical companies prepare their Bioresearch Monitoring Program (BIMO) packages as part of their initial submissions. In this paper, we walk the reader through a process of producing BIMO information, particularly the subject-level data line listings by clinical site (by-site listings) and the summary-level clinical site (CLINSITE) dataset. This paper concludes with methods of preparing electronic Common Technical Document (eCTD) documentation, such as data definition (define.xml) and the reviewer’s guide, to support the CLINSITE dataset. In addition, we discuss challenges as we share our experience in planning, producing, and quality control (QC) for a successful BIMO package.

SS-200 : Machine Readable Data Not Required for EMA… Really?
David Izard, GlaxoSmithKline

Per current European Medicines Agency (EMA) regulations the submission of machine-readable data and related documentation is not part of a regulatory filing to EMA. To the contrary, during a routine audit as part of the review cycle for a product being considered by EMA, our organization received a number of cascading requests that, in the end, mimicked a typical electronic submission of data. This paper will discuss our experience, how we engaged with regulators and how we managed the delivery of these items under tight timelines.

SS-287 : Looking Back: A Decade of ADaM Standards
Trevor Mankus, Pinnacle 21

Since ADaM-IG 1.0 was first published in 2009, the ADaM standard has evolved greatly. A decade later, in October 2019, ADaM-IG 1.2 was published and is not yet accepted by any regulatory agency as of this paper’s publication. Additionally, there is often a delay between the date of standard publication and rule publication. It can be difficult to submit to multiple regulatory agencies which support different Implementation Guide versions. This paper will summarize major differences between IG versions and rules. It will also address the challenges that the industry faces when publications of standards are out of sync with agency adoption and rule publication.

SS-291 : Confusing Data Validation Rules Explained, Part 2
Michael Beers, Pinnacle 21

The 2018 PharmaSUG paper 'Confusing Data Validation Rules Explained' by Pinnacle 21 discussed some of the more confusing data validation rules, including the possible sources of confusion, and clarifications on the intent and purpose of the validation rule. This paper will discuss an additional set of confusing validation rules and explain what the resulting validation issues would mean.

SS-292 : Split Character Variables into Meaningful Text
Savithri Jajam, Chiltern

All clinical trial data must follow CDISC standards in order to go for submission, and one of the primary criteria is to restrict variable lengths to 200 characters. One of the challenging tasks in following CDISC guidelines is splitting long text strings to multiple text strings of less than 200 characters without breaking words.There are many different approaches toward achieving this, some of which will be discussed in this paper with examples.

SS-317 : Improving the Quality of Define.xml: A Comprehensive Checklist Before Submission
Ji Qi, BioPier Inc.
Yan Li, BioPier Inc.
Lixin Gao, BioPier Inc.

The define.xml is the cover letter in Module 5 of the electronic Common Technical Document (eCTD) submission to U.S. Food and DrugAdministration (FDA) which provides a high-level summary of the metadata for all the data submitted. A functioning, complete and informative define.xml is required by FDA regulation. A high-quality define.xml will not only aid in the ease of FDA review but also convey the attentiveness of the sponsor’s work attitude to the reviewers and augment their trust in the results. In this paper, we will provide a list of details to check and fix before submission of thedefine.xml, focusing on those for clinical tabulation data (SDTM) and analysis data (ADaM), based on a review of common issues reported in papers as well as our own experience generating define.xml for clients.

SS-321 : No more a trial by fire! Trial summary made easy!!
Bhargav Koduru, Sarepta Therapeutics Inc.

Trial Summary Information (TS) domain has always been that dreaded domain that programmers’ approach only when the final trial data submission is due. This is partly because TS has too many moving parts in the form of many standalone reference dictionaries and specifications such as NDF-RT, UNII, SNOMED, etc. This is besides a thorough review of Statistical Analysis Plans (SAP), protocol, and study related data, needed for drafting the TS. Also, in the last two years, FDA’s “Study Data Technical Conformance Guide” increasingly mentions TS and has been either introducing or clarifying various TSPARMCDs, as standalone Appendixes (B and C), introduced in October 2018. The importance of TS cannot be overstated by the fact that 27% of the FDA validator rules (v 1.3), correspond to TS. This paper will serve as an end to end guide to answer all your questions about identifying various TSPARMCDs, with relevant examples, taking your TS to the next level, beyond the skeleton list of TSPARMCDs required to check off the FDA validator rules. Also, through this paper, I will provide a deep dive into the relevant updates in TSPARMCDs, in the last two years, where my previous PharmaSUG paper “Trial Summary: The Golden Gate towards a Successful Submission” left off in 2018.

SS-325 : Getting It Right: Refinement of SEND Validation Rules
Kristin Kelly, Pinnacle 21

The CDISC SDTM metadata, outlined in the SDTM Model, are used for submission of data from both clinical trials and nonclinical studies. Until recently, many of the Pinnacle 21 validation rules were assigned for both SDTM and SEND domains when in some cases, a specific rule did not apply for SEND data as outlined in the SENDIG. Over the past year, the SEND rule set has been refined through the modification of existing rules, removal of others and creation of new rules. All rules are based on either an FDA Business rule, an FDA Validator rule, or CDISC rules. This paper will discuss some of the changes that have been made in an effort to ‘get the rules right’ for SEND.

SS-327 : Step-by-step guide for a successful FDA submission
Lingjiao Qi, Statistics & Data Corporation
Bharath Donthi, Statistics & Data Corporation

Clinical trial data submission is vital for regulatory approval in the pharmaceutical industry. Preparing a high-quality, comprehensive and integrated data submission package is a challenging task as the programmer must ensure the submitted package is compliant with all applicable regulatory standards. This paper is designed as a step-by-step guide to identify and implement regulatory requirements for study data submission in an efficient manner. First, we will review key requirements in several FDA documents on submission data standards, such as the Electronic Common Technical Document (eCTD) Guidance, Guidance on Standardized Study Data (eStudy Guidance), Study Data Standardization Plan, and Study Data Technical Conformance Guide. For example, per these FDA resources, submitted data should be compliant with CDISC SDTM and ADAM standards, associated with supportive submission documents (acrf.pdf, define.xml, csdrg.pdf, adrg.pdf, etc.), and organized into a specific file directory structure. After explaining key regulatory requirements for data submission, we will share best practices to ensure data quality is maintained throughout the preparation and submission of the study data package. This step-by-step guide will assist readers in delivering an efficient, high-quality, and fully-compliant data submission package to the FDA.

SS-371 : Deciphering Codelists - Better Understanding and Handling of Controlled Terminology Metadata in Define.xml and in Reviewer’s Guides
Hariprasath Narayanaswamy, Pharma Analytics LLC

FDA Study Data Technical Conformance Guide Technical Specifications document requires that each submitted dataset have its contents described with complete metadata in the data definition (define.xml) file. Metadata is explanatory data about study ‘data elements’ (collected study data, tabulated in SDTM datasets or analysis data, documented in ADAM datasets). It is said to “give meaning to data” or to put data “in context”. One of the challenges in documenting metadata in a define for submission, is to ensure that the document clearly provides a unique set of values that the data elements represent, one that enables the reviewer to quickly understand the spectrum of information designed to be collected or documented. This is accomplished, by providing specific Controlled Terminology (CT) for the questions (such as --TESTs), responses (such as –STRESCs) and other qualifiers (such as SEX, –STRESUs, –SPECs, --METHODs). This paper analyzes the requirements for CTs, their sources, their handling and the common issues in their implementation. The first part of this paper reviews FDA and CDISC guidelines on CT implementation, different types of CT available, the components of CDISC CT, the relationships between various codelists in CDISC CT, the various external dictionaries and their usage. In the second part of this paper, we will present suggestions on implementation of various commonly used codelists, with examples. In the third part, we review commonly encountered Pinnacle 21 Validation issues, relating to codelists. And finally, we will go into the requirements on documenting CT versions and exceptions in Reviewer’s Guide.


EP-064 : A Guide for the Guides: Implementing SDTM and ADaM standards for parallel and crossover studies
Azia Tariq, GlaxoSmithKline
Janaki Chintapalli, GlaxoSmithKline

In clinical studies, dataset structures are heavily impacted by the study design and how treatment groups are compared. The two most common study designs used in clinical research are Parallel and Crossover. In a parallel study, participants are randomly assigned to a single treatment. Each treatment can include a placebo, a specific dose of the drug being investigated or a standard-of-care treatment. Crossover study design, on the other hand, randomly assigns participants to a specified sequence of treatments. When one treatment is completed, the subject will then “crossover" to another treatment during the course of the trial, resulting in each subject acting as its own control group. Typically, all subjects will receive the same number of treatments and be involved in the same number of periods. This means that even if participants are initially put into a placebo group, they will also eventually receive the study drug or standard-of-care during the trial. Usually, a cross-over study also includes a washout period which enables the effects of the preceding treatments to dissipate and eliminate any carry-over effect. The washout period is a predetermined amount of time during which patients receive no treatment. Parallel studies are straightforward when assigning treatments and deriving other analysis variables. Crossover studies require some additional work when creating treatment variables and other analysis variables. This paper will examine both study designs and explain how CDISC implementation will be different in parallel and crossover studies.

EP-099 : Color Data Listings and Color Patient Profiles
Charley Wu, Atara Biotherapeutics

During clinical trials, there are frequent data datacuts for safety data review, interim data analysis, conference presentations, CSR, etc. Medical Monitors, Statisticians, Clinical Data Managers, Pharmacovigilance usually need to review the data carefully to ensure data accuracy and integrity. These functions frequently complain that they have already reviewed the same data many times before, and they don’t like to review the same data repeatedly. They would rather pay more attention to new and updated data. However, most of data listings/patient profiles/reports cannot tell what are new data and what are old data. To solve this issue, we developed color data listings and color patient profiles. The idea is that we can set the first datacut as benchmark, all future data changes are then highlighted with different colors. For example, update data are colored yellow. New data are colored green. Deleted data are colored grey. Unchanged data are not colored. By doing so, reviewers can easily identify any new, updated, or deleted data since last datacut. Though we just implemented this in regular data listings and patient profiles, we already got very positive feedback from data reviewers. It usually takes them 1-2 weeks to finish reviewing all data listings and patient profiles. They can now finish data review in couple of days. It also makes data review an enjoyable process as colored data changes pop up to reviewer’s eyes. Please see sample output attached.

EP-168 : One Graph to Simplify Visualization of Clinical Trial Projects
Zhouming(Victor) Sun, Astrazeneca

Managing projects during clinical trials is a very challenge topic, especially when the multiple projects/studies are involved with the various deliveries are required per timeline from each study. As a project leader for the statistical programming, therefore, the planning and monitoring are keys of success to ensure all studies within each project are on track, and all deliveries are on time in high quality under optimizing resources. This paper presents one comprehensive graph only to simplify visualization of the multiple projects /studies but includes key timelines and major information for tracking progresses for each individual study in real-time. The graph is generated by using SAS Graph Template Language (GTL) so the resourcing, task assignments, deliveries, and study status under multiple projects can be visualized and managed in more efficient manner.

EP-172 : TDF – Overview and Status of the Test Data Factory Project, Standard Analyses & Code Sharing Working Group
Nancy Brucken, Clinical Solutions Group
Peter Schaefer, VCA-Plus, Inc.
Dante Di Tommaso, Omeros

Test Data Factory is one of six projects within PhUSE’s Standard Analyses and Code Sharing Working Group. Suitable test data are an essential part of software development and testing. The objective of the TDF Project is to provide up-to-date CDISC-compliant data sets to empower statistical programmers and software developers. Users should be able to customize fundamental aspects of test databases. The TDF Project team have published two data packages based on SDTM and ADaM data sets that CDISC published in a pilot. Now the TDF team have begun to implement SAS and R code to simulate a clinical trial database based on user configuration. PhUSE is a volunteer organization that relies on community contribution to progress initiatives such as TDF. This poster and paper inform the community of TDF history, current activities and future plans, and have the secondary intent of inspiring community members to join our efforts and to contribute their expertise.

EP-174 : Standard Analyses and Code Sharing Working Group Update
Nancy Brucken, Clinical Solutions Group, Inc.
Dante Di Tommaso, Omeros
Jane Marrer, Merck
Mary Nilsson, Eli Lilly and Company
Jared Slain, MPI Research
Hanming Tu, Frontage

This paper updates the community on the efforts of the six project teams in the PHUSE Standard Analyses and Code Sharing Working Group. The Working Group publishes recommended analyses of clinical data suitable across therapeutic areas. These publications include presentations (tables, listings and figures) of the results from those analyses. The Working Group's GitHub repository contains a wealth of scripts that have been written by PHUSE members, or developed and contributed by the FDA and other organizations. The collaborative efforts of this group improve our collective efforts to design and implement transparent and robust analyses of our clinical data for regulatory decision making. Crowd-sourcing code development of these recommended analyses can promote access to and adoption of these analyses, and bring efficiencies and savings to our drug development and review processes.

EP-175 : Use of Cumulative Response Plots in Clinical Trial Data
Jane Lu, Astrazeneca Pharmaceutical

FDA's guidance document, Clinical Studies Section of Labeling for Human Prescription Drug and Biological Products recommends graphical presentation of study data by cumulative distribution of responses among individual subjects. As a snapshot of all the data, cumulative response plot provides more detailed information than routine summary statistics. Cumulative response plot illustrates the entire spectrum of treatment differences between active drug and placebo or other comparators. The plot can be used to estimate clinical changes, characterize treatment effects, determine response thresholds, explore effects of sample sizes, etc. This paper presents two easy methods to generate cumulative response plots using SAS.

EP-176 : 10 things you need to know about PMDA eSubmission
Yuichi Nakajima, Novartis

From April 2020, PMDA mandates electronic submission in new drug application in Japan. PMDA has published Basic Principals, Notification on Practical Operations, Technical Conformance Guide and FAQs of electronic submission for applicants. Although those guidance documents are covering general topic, there are many operational and technical challenges found in the transitional period from October 2016 to March 2020. FDA and PMDA are different. Needless to say, electronic Case Report Tabulation (eCRT) package accepted by FDA is not always accepted by PMDA. It is difficult to define “golden standard” of electronic submission due to its various submission scenarios. This poster will provide several tips and awareness, which was obtained from actual experiences during transitional period, and support your smooth PMDA submission.

EP-177 : Detecting Side Effects and Evaluating the Effectiveness of Drugs from Customers’ Online Reviews using Text Analytics, Sentiment Analysis and Machine Learning Models
Thu Dinh, Oklahoma State University
Goutam Chakraborty, Oklahoma State University

Drug reviews play a very significant role in providing crucial medical care information for both healthcare professionals and consumers. Customers are increasingly utilizing online review sites, discussion boards and forums to voice their opinions and express their sentiments about experienced drugs. However, a potential buyer typically finds it very hard to go through all comments before making a purchase decision. Another big challenge would be the unstructured, qualitative, and textual nature of the reviews, which makes it difficult for readers to classify the comments into meaningful insights. In light of that, this paper primarily aims at classifying the side effect level and effectiveness level of prescribed drugs by utilizing text analytics and predictive models within SAS® Enterprise Miner™. Additionally, the paper explores specific effectiveness and potential side effects of each prescription drug through sentiment analysis and text mining within SAS® Sentiment Analysis Studio and SAS® Visual Text Analytics. The study’s preliminary results show that the best performing model for side effect level classification is the rule-based model with a validation misclassification rate at 27.1%. Regarding effectiveness level classification, text rule builder model also works best with a 22.4% validation misclassification rate. These models are further validated using transfer learning algorithm to evaluate performance and generalization. The results can act as practical guidelines and useful references to facilitate prospective patients in making better informed purchase decisions.

EP-237 : CTCAE up-versioning – a simple way to deal with the complexity of lab toxicity grading
Jianwei Liu, AstraZeneca

The NCI Common Terminology Criteria for Adverse Events (CTCAE) consist of a list of grading criteria that can be utilized for lab toxicity grading in Oncology. However, the criteria vary among the different versions of CTCAE, and programming rules need to be adjusted accordingly when implementing the newer version of CTCAE for lab toxicity grading. For example, it is required by CTCAE v5.0 for certain lab tests to be graded based on baseline status, in addition of comparing with LLN (Lower limit of normal) or ULN (upper limit of normal). Hence programming for lab toxicity grading for CTCAE v5.0 can be more complicated than for CTCAE v4.03. This paper will illustrate the differences between CTCAE v4.03 and CTCAE v5.0, as well as provide a simple and straightforward programming logic on lab toxicity grading, which can be used for any future up-versioning.

EP-268 : Look Up Not Down: Advanced Table Lookups in Base SAS
Jayanth Iyengar, Data Systems Consultants LLC
Josh Horstman, Nested Loop Consulting

One of the most common data manipulation tasks SAS programmers perform is combining tables through table lookups. In the SAS programmer’s toolkit many constructs are available for performing table lookups. Traditional methods for performing table lookups include conditional logic, match-merging and SQL joins. In this paper we concentrate on advanced table lookup methods such as formats, multiple SET statements, and HASH objects. We conceptually examine what advantages they provide the SAS programmer over basic methods. We also discuss and assess performance and efficiency considerations through practical examples.

EP-270 : My Favorite SAS® Tips, Tricks and Techniques
Nirmal Balasubramanian, Quartesian Clinical Research
Praveenraj Mathivanan, Quartesian Clinical Research

If you are a SAS programmer, you may face challenges in day-to-day programming activities. This topic is designed to share quick tips, workaround macros and logic to ease the programming tasks, which takes less time to learn and implement. Included are: Wonders of Computation block in Proc Report. Customisation of Figure’s legend with help of LegendItem and Layout GlobalLegend. Workaround macros for day-to-day activities. Not to worry about pagination anymore with programming logic. A little workaround in decimal precision to increase the efficiency of coding. Little known functions that get rid of messy coding and saves time. Tricks in SAS EG to ease the work. Excel functionalities to make the task easier. Make life easier using command line in Windows and Unix.

EP-337 : Generating ADaM compliant ADSL Dataset by Using R
Vipin Kumpawat, Eliassen Group
Lalitkumar Bansal, Statum Analytics LLC

SAS has been widely used for generating clinical trials data sets. Whereas, R is used for data analysis and has gained some popularity among statisticians and programmers. R can be considered as a viable alternative to SAS for generating specialized clinical trials data sets like SDTM, ADaM and tables and figures. In this work we generate an ADaM compliant ADSL data set (Subject Level Analysis Data set) by using R. R packages such as sas7bdat,dplyr, tidyr, parsedate and hmisc are used and compared to SAS functions in terms of their computational efficiencies. This paper detail’s the typical steps used to create the ADSL data set which begins with reading the various SDTM data sets followed by a procedure to transpose the SUPPDM data set and merge with DM data set. A procedure to extract the EX and DS variables from EX and DS data sets respectively and then merging with final DM data set is detailed. We also demonstrate how to derive numeric variables, flags, treatment variables and trial dates for the ADSL data set. Finally, the R procedures to attach labels to the variables are discussed; this procedure then culminates in the export of the final ADSL data set. A side by side comparison between R and SAS procedures is outlined. Certain weakness in R such as attaching labels to the variables has been resolved in this work. Finally, the challenges encountered in generating the ADSL data set using R are discussed and compared to SAS.

EP-353 : Visually Exploring Proximity Analyses Using SAS® PROC GEOCODE and SGMAP and Public Use Data Sets
Louise Hadden, Abt Associates Inc.

Numerous international and domestic governments provide free public access to downloadable databases containing health data. Two examples include the Demographic and Health Surveys which include data from Afghanistan to Zimbabwe and the Centers for Medicare and Medicaid Services' Compare data and Part D Prescriber public use files. This paper and presentation will describe the process of downloading data and creating an analytic data base which includes geographic data; running SAS®’ PROC GEOCODE (part of Base SAS®) using Tiger street address level data to obtain latitude and longitude at a finer level than zip code; and finally using PROC SGMAP (part of Base SAS®) with annotation to create a visualization of a proximity analysis.

EP-355 : Integrating Your Analytics In Database with SAS, Hadoop and other data types in Teradata EDW using Agile Analytics to analyze many different data types.
Bob Matsey, Teradata

Many pharmaceutical companies are leveraging Agile analytics to increase Turn-around time and Business Analyst's performance in various aspects of the business. With Agile analytics and Data Labs, business analysts, Data Scientists and business users are empowered to test, Prototype and experiment with an hypothesis by having direct access to many types of corporate data without needing requirements definition, projects to define data, engaging IT to load their data, and other processes and procedures that bottleneck the ‘quick’ nature of Agile Analytics insights. We will discuss technics on how to bring multiple data types ( SAS, Hadoop, Excel spreadsheets, external data, etc.) together quickly in a Massively parallel environment in Data Labs using In database with SAS, R or Python to analyze your data quickly and get the results back in a fraction of the time that they use to. We will also share technology capabilities, best practices and success stories in the pharmaceutical industry using these agile techniques and the technologies that enable you to implement this quickly and efficiently!

EP-362 : SDSP: Sponsor and FDA Liaison
Bhanu Bayatapalli, University of Thiruvalluvar at INDIA

The discussion between a sponsor and FDA on data standards for statistical programming deliverables in an electronic submission should start at the early stages of product development and continue along the way to filing. This discussion will involve data standards, structures, and versions to be used for each study submitted with an NDA or BLA. The Study Data Standardization Plan (SDSP) is used as a tool to communicate with FDA on these aspects. Sponsors and applicants are encouraged to utilize established FDA-sponsor meetings (e.g., pre-IND, end of phase 2, Type B/C) to share and discuss the SDSP.

EP-367 : Confirmation of Best Overall Tumor Response in Oncology Clinical Trials per RECIST 1.1
Danyang Bing, ICON Clinical Research
Randi McFarland, ICON Clinical Research

In oncology clinical trials for solid tumor, the revised RECIST guideline (version 1.1) is the standard guidance for response evaluation. Many statistical programmers with oncology experience are familiar with tumor burden calculations and deriving best overall response following RECIST v1.1, however have limited experience with confirmation of response. In non-randomized trials where response is the primary endpoint, confirmation of partial response (PR) or complete response (CR) is required and handled in response analysis datasets. Instruction in the RECIST v1.1 guideline does not provide the logic to handle response scenarios for all data. Clarification of the confirmation logic to use for specific scenarios based on RECIST v1.1 is presented. Subsequent timeline requirements and minimum durations for stable disease (SD) is addressed. Finally, handling of intervening responses of SD or not evaluable (NE) between two CR or PR response time points is explained.