Paper presentations are the heart of a PharmaSUG conference. Here is the list including the next batch of confirmed paper selections. Papers are organized into 12 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 02-Apr-2024.


Advanced Programming

Paper No. Author(s) Paper Title (click for abstract)
AP-102 Derek Morgan Creating Dated Archives Automatically with SAS®
AP-108 Bart Jablonski Macro Variable Arrays Made Easy with macroArray SAS package
AP-135 Lisa Mendez
& Richann Watson
LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
AP-138 Timothy Harrington An Introduction to the SAS Transpose Procedure and its Options
AP-144 Charu Shankar SAS® Super Duo: The Program Data Vector and Data Step Debugger
AP-175 Jeffrey Meyers Tips for Completing Macros Prior to Sharing
AP-191 Songgu Xie
& Michael Pannucci
& Weiming Du
& Huibo Xu
& Toshio Kimura
Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
AP-195 Yi Guo A Simple Way to Make Adaptive Pages in Listings and Tables
AP-212 Samiul Haque
& Jim Box
R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
AP-218 Xinran Luo
& Weijie Yang
Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
AP-229 Vicky Yuan Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
AP-237 Stephen Sloan Getting a Handle on All of Your SAS® 9.4 Usage
AP-252 Frank Canale Externally Yours - Adeptly Managing Data Outside Your EDC System
AP-253 James Austrow Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
AP-256 Ian Sturdy Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
AP-268 Xiangchen Cui
& Jessie Wang
& Min Chen
A New Approach to Automating the Creation of the Subject Visits (SV) Domain
AP-289 Jianfeng Wang
& Li Cheng
Programming with SAS PROC DS2: Experience with SDTM/ADaM
AP-295 David Bosak Replicating SAS® Procedures in R with the PROCS Package
AP-298 Huitong Niu
& Yan Wang
Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
AP-349 Richann Watson
& Louise Hadden
Just Stringing Along: FIND Your Way to Great User-Defined Functions
AP-361 Chary Akmyradov Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
AP-411 Lex Jansen Running the CDISC Open Rules Engine (CORE) in BASE SAS
AP-420 Adam Yates
& Misti Paudel
& Fengming Hu
Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
AP-424 Magnus Mengelbier Adding the missing audit trail to R

Data Standards

Paper No. Author(s) Paper Title (click for abstract)
DS-109 Philip Mason Analyzing your SAS log with user defined rules using an app or macro.
DS-130 Wanchian Chen SDTM Specifications and Datasets Review Tips
DS-150 Laura Elliott
& Ben Bocchicchio
Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
DS-154 Richann Watson
& Elizabeth Dennis
& Karl Miller
Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
DS-188 Wei Shao
& Xiaohan Zou
Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
DS-193 Inka Leprince
& Richann Watson
Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
DS-204 Sandra Minjoe ADaM Discussion Topics: PARQUAL, ADPL, Nadir
DS-205 Crystal Cheng A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
DS-214 Sujana Katta
& Srinivas Macherla
Data Quality Framework application for Data management using R Shiny
DS-271 Alec McConnell
& Yun Peng
Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
DS-274 Kristin Kelly
& Michael Beers
Guidance Beyond the SDTM Implementation Guide
DS-276 Soumya Rajesh Your Guide to Successfully Upversioning CDISC Standards
DS-280 Laura Fazio
& Andrew Burd
& Emily Murphy
& Melanie Hullings
I Want to Break Free: CRF Standardization Unleashing Automation
DS-287 Lihui Deng
& Kylie Fan
& Jia Li
ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
DS-305 Vibhavari Honrao Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
DS-310 Pritesh Desai
& Mary Liang
Converting FHIR to CDASH using SAS
DS-342 Karin LaPann CDISC Therapeutic Area User Guides and ADaM Standards Guidance
DS-353 Anbu Damodaran
& Ram Gudavalli
& Kumar Bhimavarapu
Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
DS-360 Swaroop Kumar Koduri
& Shashikant Kumar
& Sathaiah Sanga
A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
DS-367 Wei Duan Handling of Humoral and Cellular Immunogenicity Data in SDTM
DS-374 Reshma Radhakrishnan Implementation of composite estimands for responder analysis based on change from baseline in non-solid tumours
DS-388 Rubha Raghu
& Sugumaran Muthuraj
& Vijayakumar Radhakrishnan
& Nithiyanandhan Ananthakrishnan
Advancing the Maturation of Standardized CRF Design
DS-398 Varsha Mithun Patil
& Mrityunjay Kumar
Streamlining Patient-reported outcome (PRO) data standardization & analysis
DS-400 Steve Ross
& Ilan Carmeli
AI and the Clinical Trial Validation Process - Paving a Rocky Road
DS-406 Santosh Ranjan Game changer! The new CDISC ADaM domain ADNCA for PK/PD data analysis

Data Visualization and Reporting

Paper No. Author(s) Paper Title (click for abstract)
DV-127 Louise Hadden The Missing(ness) Piece: Building Comprehensive, Data Driven Missingness Reports and Codebooks Dynamically
DV-155 Jeffrey Meyers Combining Functions and the POLYGON Plot to Create Unavailable Graphs Including Sankey and Sunburst Charts
DV-170 Kirk Paul Lafler Creating Custom Excel Spreadsheets with Built-in Autofilters Using SAS® Output Delivery System (ODS)
DV-186 Ilya Krivelevich
& Cixin He
& Binbin Zhang-Wiener
& Wenyin Lin
Enhanced Spider Plot in Oncology
DV-196 Yi Guo Comparing SAS® and R Approaches in Creating Multicell Dot Plots in Statistical Programming
DV-216 Margaret Wishart
& Tamara Martin
Utilizing Data Visualization for Continuous Safety and Efficacy Monitoring within Early Development
DV-222 Mrityunjay Kumar
& Shashikant Kumar
Kaplan-Meier Graph: a comparative study using SAS vs R
DV-246 Indraneel Narisetty AutoVis Oncology Presenter: Automated Python-Driven Statistical Analysis and Visualizations for Powerful Presentations
DV-261 Karthik Venkataraman
& Gayathri Ravikumar
RWE, Big Data and ML for product innovation in medical devices
DV-278 Kuldeep Sen Standardization of the Patient Narrative Using a Metadata-driven Approach
DV-283 Tongda Che
& Danfeng Fu
Exploring the Application of FDA Medical Query (FMQ) in Visualizing Adverse Event Data
DV-293 Dave Hall Splashy Graphics Suitable for Publication? ODS LAYOUT Can Do It!
DV-313 Kostiantyn Drach
& Iryna Kotenko
Visual discovery in Risk-Based Monitoring using topological models
DV-323 Chevell Parker Tales From A Tech Support Guy: The Top Ten Most Impactful Reporting and Data Analytic Features for the SAS Programmer
DV-327 Junze Zhang
& Chuanhai Tang
& Xiaohui Wang
A R Markdown Structure for Automatically Generating Presentation Slides
DV-328 Chevell Parker Next level Reporting: ODS and Open Source
DV-331 Kirk Paul Lafler Ten Rules for Better Charts, Figures and Visuals
DV-348 Murali Kanakenahalli
& Annette Bove
& Smita Sehgal
Periodic safety reports of clinical trials
DV-380 Tracy Sherman
& Aakar Shah
Amazing Graph Series: Swimmer Plot - Visualizing the Patient Journey: Adverse Event Severity, Medications, and Primary Endpoint
DV-382 Helena Belloff
& William Lee
& Melanie Hullings
A 'Shiny' New Perspective: Unveiling Next-Generation Patient Profiles for Medical and Safety Monitoring
DV-389 Vijayakumar Radhakrishnan
& Nithiya Ananthakrishnan
Automation and integration of data visualization using R ESQUISSE & R SHINY
DV-395 Pradeep Acharya
& Anurag Srivastav
Pictorial Representation of Adverse Events (AE) Summary- " A new perspective to look at the AE data in Clinical Trials
DV-396 Yun Ma
& Yifan Han
Piloting data visualization and reporting with Rshiny apps
DV-433 Steve Wade
& Sudhir Kedare
& Matt Travell
& Chen Yang
& Jagan Mohan Achi
Interactive Data Analysis and Exploration with composR: See the Forest AND the Trees
DV-438 Kevin Viel Exploring DATALINEPATTERNS, DATACONTRASTCOLORS, DATASYMBOLS, the SAS System® REGISTRY procedure, and Data Attribute Maps (ATTRMAP) to assign invariant attributes to subjects and arms throughout a proj
DV-455 Vandita Tripathi
& Manas Saha
Reimagining reporting and Visualization during clinical data management
DV-456 Joshua Cook An introduction to Quarto: A Versatile Open-source Tool for Data Reporting and Visualization
DV-458 Joshua Cook
& Kirk Paul Lafler
Quarto 1.4: Revolutionizing Open-source Dashboarding Capabilities

Hands-on Training

Paper No. Author(s) Paper Title (click for abstract)
HT-101 Mathura Ramanathan
& Nancy Brucken
Deep Dive into the BIMO (Bioresearch Monitoring) Package Submission
HT-111 Bart Jablonski A Gentle Introduction to SAS Packages
HT-118 Philip Holland The Art of Defensive SAS Programming
HT-143 Charu Shankar The New Shape Of SAS Code
HT-152 Phil Bowsher GenAI to Enhance Your Statistical Programming
HT-157 Jayanth Iyengar Understanding Administrative Healthcare Datasets using SAS ' programming tools.
HT-197 Dan Heath Building Complex Graphics from Simple Plot Types
HT-201 Ashley Tarasiewicz
& Chelsea Dickens
Transitioning from SAS to R
HT-413 Richann Watson
& Josh Horstman
Complex Custom Clinical Graphs Step by Step with SAS® ODS Statistical Graphics
HT-459 Troy Martin Hughes Hands-on Python PDFs: Using the pypdf Library To Programmatically Design, Complete, Read, and Extract Data from PDF Forms Having Digital Signatures

Leadership Skills

Paper No. Author(s) Paper Title (click for abstract)
LS-134 Patrick Grimes Recruiting Neurodivergent Candidates using the Specialisterne Approach
LS-167 Kirk Paul Lafler Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
LS-176 Jeff Xia
& Simiao Ye
Effectively Manage the Programming Team Using MS Team
LS-286 Priscilla Gathoni Unlock Your Greatness: Embrace the Power of Coaching
LS-304 Diana Avetisian Translation from statistical to programming: effective communication between programmers and statisticians
LS-317 LaNae Schaal What Being a Peer-to-Peer Mentor Offers - Perspective from an Individual Project Level Contributor
LS-335 Monali Khanna Creating a Culture of Engagement - Role of a Manager
LS-345 Christiana Hawn
& Lily Ray
Leadership Lessons from Another Life: How my Previous Career Helped Me as a Statistician
LS-351 Anbu Damodaran
& Neha Srivastava
A Framework for Risk-Based Oversight for Fully Outsourced Clinical Studies
LS-357 Purvi Kalra
& Varsha Patil
Harmony in Motion: Nurturing Work-Life Balance for Sustainable Well-being
LS-371 Dilip Raghunathan Go Get - Em: Manager's Guide to Make a Winning Business Proposal for Technology Solutions
LS-383 Mathura Ramanathan Ongoing Trends and Strategies to Fine-tune the CRO/Sponsor Partnership - Perspectives from Statistical Programming
LS-410 Josh Horstman
& Richann Watson
Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
LS-443 Melanie Hullings
& Andrew Burd
& Helena Belloff
& Emily Murphy
Data Harmony Revolution: Rocking Trials with Clinical Data Literacy

Metadata Management

Paper No. Author(s) Paper Title (click for abstract)
MM-225 Kang Xie Variable Subset Codelist
MM-226 Jeetender Chauhan
& Madhusudhan Ginnaram
& Sarad Nepal
& Jaime Yan
Methodology for Automating TOC Extraction from Word Documents to Excel
MM-240 Avani Kaja Managing a Single Set of SDTM and ADaM Specifications across All Your Phase 1 Trials
MM-245 Trevor Mankus Relax with Pinnacle 21's RESTful API
MM-267 Xiangchen Cui
& Min Chen
& Jessie Wang
A Practical Approach to Automating SDTM Using a Metadata-Driven Method That Leverages CRF Specifications and SDTM Standards
MM-358 Lakshmi Mantha
& Purvi Kalra
& Arunateja Gottapu
Optimizing Clinical Data Processes: Harnessing the Power of Metadata Repository (MDR) for Innovative Study Design (ISD) and Integrated Summary of Safety (ISS) / Efficacy (ISE)
MM-447 Vandita Tripathi
& Manas Saha
Automating third party data transfer through digitized Electronic DTA Management

Real World Evidence and Big Data

Paper No. Author(s) Paper Title (click for abstract)
RW-125 Ajay Gupta
& Natalie Dennis
Reconstruction of Individual Patient Data (IPD) from Published Kaplan-Meier Curves Using Guyot's Algorithm: Step-by-Step Programming in R
RW-227 Yu Feng A SAS® Macro Approach: Defining Line of Therapy Using Real-World Data in Oncology
RW-260 Karthik Venkataraman
& Rajesh Karthikeyan
Fidelity Assessment of Real-World Data as An External Control Arm
RW-275 Catherine Briggs
& Sherrine Eid
& Samiul Haque
& Robert Collins
Win a PS5! How to Run and Compare Propensity Score Matching Performance Across Multiple Algorithms in Five Minutes or Less
RW-390 Ryan Lafler
& Anna Wade
Unraveling the Layers within Neural Networks: Designing Artificial and Convolutional Neural Networks for Classification and Regression Using Python's Keras & TensorFlow
RW-421 Sherrine Eid
& Robert Collins
& Samiul Haque
Applications of Machine Learning and Artificial Intelligence in Real World Data in Personalized Medicine for Non-Small Cell Lung Cancer Patients
RW-450 Lorraine Johnson
& Lara Kassab
& Jingyi Liu
& Deanna Needell
& Mira Shapiro
Towards understanding Neurological manifestations of Lyme disease through a machine learning approach with patient registry data
RW-453 Joshua Cook
& Achraf Cohen
Interfacing with Large-scale Clinical Trials Data: The Database for Aggregate Analysis of

Solution Development

Paper No. Author(s) Paper Title (click for abstract)
SD-114 Bruce Gilsen SAS ® Program Efficiency for Beginners
SD-141 Kevin Lee "Prompt it", not "Google it" : Prompt Engineering for Statistical Programmers and Biostatisticians
SD-165 Kirk Paul Lafler
& Ryan Lafler
& Joshua Cook
& Stephen Sloan
Benefits, Challenges, and Opportunities with Open-Source Software Integration
SD-166 Kirk Paul Lafler The 5 CATs in the Hat - Sleek Concatenation String Functions
SD-179 Jim Box
& Samiul Haque
Developing Web Apps in SAS Visual Analytics
SD-198 Chengxin Li AutoSDTM Design and Implementation With SAS Macros
SD-200 Illia Skliar Bridging AI and Clinical Research: A New Era of Data Management with ChatGPT
SD-211 Matt Maloney Utility Macros for Data Exploration of Clinical Libraries
SD-217 William Wei
& Shunbing Zhao
Semi-Automated and Modularized Approach to Generate Tables for Clinical Study - Categorical Data Report
SD-239 Hong Qi
& Mary Varughese
Automation of Report Generation Beyond Macro
SD-241 Umayal Annamalai
& Srihari Hanumantha
& Madhu Annamalai
Dynamic patient-centric reporting using R Markdown
SD-243 Lakshmi Mantha
& Inbasakaran Ramesh
Unravelling the SDTM Automation Process through the Utilization of SDTM Transformation Template
SD-255 Danfeng Fu
& Dickson Wanjau
& Ben Gao
Define-XML Conversion: A General Approach on Content Extraction Using Python
SD-262 Bart Jablonski Integration of SAS GRID environment and SF-36 Health Survey scoring API with SAS Packages
SD-266 Amy Zhang
& Huei-Ling Chen
A Tool for Automated Comparison of Core Variables Across ADaM Specifications Files
SD-308 Arunateja Gottapu AI-Enhanced Clinical Data Interaction: Enhancing Data Management, Programming, and Validation Using LangChain and Pandas in Python
SD-318 Yunsheng Wang
& Erik Hansen
& Chao Wang
& Tina Wu
Streamlined EDC data to SDTM Mapping with Medidata RAVE ALS
SD-343 Benjamin Straub Two hats, one noggin: Perspectives on working as a developer and as a user of the admiral R package for creating ADaMs.
SD-356 Kevin Viel Standardizing Validation Data Sets (VALDS) as matrices indexed by Page, Section, Row, and Columns (PSRC) to improve Validation and output creation and revisions.
SD-365 Zhihao Luo Readlog Utility: Python based Log Tool and the First Step of a Comprehensive QC System
SD-370 Yongjiang (Jerry) Xu
& Karen Xu
& Suzanne Viselli
Enhancing FDA Debarment List Compliance through Automated Data Analysis Using Python and SAS
SD-397 Saurabh Das
& Rohit Kadam
& Rajasekhar Gadde
& Niketan Panchal
& Saroj Sah
Advancing Regulatory Intelligence with conversational and generative AI
SD-401 Xinran Hu
& Jeff Xia
Excel Email Automation Tool: Streamlining Email Creation and Scheduling
SD-412 Sherrine Eid
& Sundaresh Sankaran
Safety Signals from Patient Narratives PLUS: Augmenting Artificial Intelligence to Enhance Generative AI Value
SD-426 Rajprakash Chennamaneni
& Sudhir Kedare
& Jagan Mohan Achi
Shift gears with 'gt': Finely tuned clinical reporting in R using "gt" and "gt summary" packages
SD-429 Bill Zhang
& Jun Yang
Build up Your Own ChatGPT Environment with Azure OpenAI Platform
SD-431 Steve Wade
& Sudhir Kedare
& Matt Travell
& Chen Yang
& Jagan Mohan Achi
inspectoR: QC in R? No Problem!
SD-444 Troy Martin Hughes Five Reasons To Swipe Right on PROC FCMP, the SAS Function Compiler for Building Modular, Maintainable, Readable, Reusable, Flexible, Configurable User-Defined Functions and Subroutines

Statistics and Analytics

Paper No. Author(s) Paper Title (click for abstract)
ST-113 Girish Kankipati
& Jai Deep Mittapalli
Multiple Logistic Regression Analysis using Backward Selection Process on Objective Response Data with SAS®
ST-164 Kirk Paul Lafler Data Literacy 101: Understanding Data and the Extraction of Insights
ST-192 Igor Goldfarb
& Sharma Vikas
Generative Artificial Intelligence in sample size estimation - challenges, pitfalls, and conclusions
ST-199 Yuting Peng
& Ruohan Wang
Demystifying Incidence Rates: A Step-by-Step Guide to Adverse Event Analysis for Novice Programmers
ST-208 Vadym Kalinichenko Bayesian Methods in Survival Analysis: Enhancing Insights in Clinical Research
ST-234 Stephen Sloan A unique and innovative end-to-end demand planning and forecasting process using a collection of SAS products
ST-251 Isabella Wang
& Jin Xie
& Lauren George
Dealing with Missing Data: Practical Implementation in SAS and R
ST-297 Christiana Hawn
& Dhruv Bansal
Relative Dose Intensity in Oncology Trials: A Discussion of Two Approaches
ST-303 Ibrahim Priyana Hardjawidjaksana
& Els Janssens
& Ellen Winckelmans
Source Data Quality Issues in PopPK/PD Dataset Programming: a Systematic Approach to Handle Duplicates
ST-334 Ethan Brockmann
& Dong Xi
Versatile and efficient graphical multiple comparison procedures with {graphicalMCP}
ST-338 Michael Lamm Bayesian Additive Regression Trees for Counterfactual Prediction and Estimation
ST-339 Fang Chen
& Yi Gong
Bayesian Hierarchical Models with the Power Prior Using PROC BGLIMM
ST-366 Chuck Kincaid MLNR or Machine Learning in R
ST-381 Peng Zhang
& Lizhong Liu
& Tai Xie
Opportunities and Challenges for R as an open-sourced solution for statistical analysis and reporting, from vendor's perspective
ST-414 Richard Moreton
& Lata Maganti
Estimating Time to Steady State Analysis in SAS
ST-425 Sudhir Kedare
& Steve Wade
& Chen Yang
& Matthew Travell
& Jagan Mohan Achi
iCSR: A Wormhole to Interactive Data Exploration Universe

Strategic Implementation & Innovation

Paper No. Author(s) Paper Title (click for abstract)
SI-136 Ke Xiao Agile, Collaborative, Efficient (ACE): A New Perspective on Data Monitoring Committee Data Review Preparation
SI-140 Kevin Lee A fear of missing out and a fear of messing up : A Strategic Roadmap for ChatGPT Integration at Company Level
SI-160 Jason Zhang
& Jaime Yan
LLM-Enhanced Training Agent for Statistical Programming
SI-185 Binal Mehta
& Patel Mukesh
The Role of the Blinded Programmer in Preparation of Data Monitoring Committee Packages (for Clinical Trials)
SI-189 Vidya Gopal Automating the annotation of TLF mocks Using Generative AI
SI-190 Ruohan Wang
& Chris Qin
Navigating Success: Exploring AI-Assisted Approaches in Predicting and Evaluating Outcome of Clinical Trials and Submissions
SI-230 Todd Case
& Margaret Huang
Quality Assurance within Statistical Programming: A Systemic Way to Improve Quality Control
SI-269 Juliane Manitz
& Anuja Das
& Antal Martinecz
& Jaxon Abercrombie
& Doug Kelkhoff
Validating R for Pharma - Streamlining the Validation of Open-Source R Packages within Highly Regulated Pharmaceutical Work
SI-291 Nancy Brucken
& Mary Nilsson
& Greg Ball
PHUSE Safety Analytics Working Group - Overview and Deliverables Update
SI-319 Lydia King A Change is Gonna Come: Maintaining Company Culture, Managing Time Zones, and Integrating Teams after a Global Acquisition
SI-346 Chaitanya Pradeep Repaka
& Santhosh Karra
aCRF Copilot: Pioneering AI/ML Assisted CRF Annotation for Enhanced Clinical Data Management Efficiency
SI-362 Karma Tarap
& Nicole Thorne
& Tamara Martin
& Derek Morgan
& Pooja Ghangare
SASBuddy: Enhancing SAS Programming with Large Language Model Integration
SI-391 Chaitanya Pradeep Repaka
& Santhosh Karra
Facilitating Seamless SAS-to-R Transition in Clinical Data Analysis: A Finetuned LLM Approach
SI-408 Manuela Koska
& Veronika Csom
Agile Sponsor Oversight of Statistical Programming Activities
SI-446 Shilpa Sood
& Sridhar Vijendra
One size does not fit all: The need and art of customizing SCE and MDR for end users
SI-452 Amit Javkhedkar
& Sridhar Vijendra
Embracing Diversity in Statistical Computing Environments: A Multi-Language Approach
SI-461 Ishwar Chouhan Oncology ADaM Datasets Creation Using R Programming: A Comprehensive Approach

Submission Standards

Paper No. Author(s) Paper Title (click for abstract)
SS-132 Jai Deep Mittapalli
& Girish Kankipati
BIMO Brilliance: Your Path to Compliance Resilience
SS-133 Jai Deep Mittapalli
& Jinit Mistry
& Venkatesulu Salla
Cultivating Success with Non-standard Investigator-sponsored Trial Data for FDA Submissions
SS-137 David Izard Study Start Date - Let's Get it Right!
SS-213 Sandra Minjoe Is a Participation-Level ADaM Dataset a Solution for Submitting Integration Data to FDA?
SS-263 Vicky Yuan Creating Adverse Event Tables using R and SASSY System
SS-285 Vishwateja Maduri
& Purushotham Namburi
& Prashanth Kasarla
Submission Requirements and Magnification of Differences among Regulatory Authorities
SS-290 Robin Wu
& Lili Li
& Steven Huang
Combine PDFs in Submission-ready Format Quick and Easy
SS-306 Swaroop Neelapu Leveraging SAS and Adobe Plug-in for CRF Bookmark Generation (Rave studies)
SS-311 Yilan Xu
& Hu Qu
& Tina Wu
How to generate a submission ready ADaM for complex data
SS-333 Hanne Ellehoj
& Veeresh Namburi
Lead-in and extension trials, how we documented datapoint traceability
SS-344 Benjamin Straub Piloting into the Future: Publicly available R-based Submissions to the FDA
SS-363 Hiba Najeeb
& Raghavender Ranga
A Programmer's Insight into an Alternative to TQT Study Data Submission
SS-368 Rashmi Gundaralahalli Ramesh
& Jeffrey Lavenberg
Design Considerations for ADaM Protocol Deviations Dataset in Vaccine Studies
SS-376 André Veríssimo
& Ismael Rodriguez
Experimenting with Containers and webR for Submissions to FDA in the Pilot 4
SS-377 Wei Duan Challenges and Considerations When Building e-Submission SDTM Data Packages
SS-422 Flora Mulkey Submitting Patient-Reported Outcome Data in Cancer Clinical Trials- Guidance for Industry Technical Specifications Document


Paper No. Author(s) Paper Title (click for abstract)
PO-106 Xianhua Zeng No LEAD Function? Let's Create It!
PO-123 Kevin Sun Enhancing Define-XML Generation: Based on SAS Programming and Pinnacle 21 Community
PO-128 Louise Hadden A Deep Dive into Enhancing SAS/GRAPH® and SG Procedural Output with Templates, Styles, Attributes, and Annotation
PO-129 Varsha Ganagalla
& Natalie Johnson
The Survival Mode
PO-145 Jason Su Integrity, Please: Three Techniques for One-Step Solution in Pharmaceutical Programming
PO-158 Jayanth Iyengar If its not broke, don't fix it; existing code and the programmers' dilemma
PO-194 Elizabeth Li
& Carl Chesbrough
& Inka Leprince
Updates on Preparing a BIMO Data Package
PO-231 Michael Stout Best Function Ever: PROC FCMP
PO-258 Madhavi Gundu
& Vivek Jayesh Mandaliya
An approach to make Data Validation and Reporting tool using R Shiny for Clinica Data Validation
PO-292 Julie Ann Hood
& Jennifer Manzi
Elevate Your Game: Leveling Up SDTM Validation with the Magic of Data Managers
PO-299 Chen Li
& Hong Wang
& Ke Xiao
Upholding Blinding in Clinical Trials: Strategies and Considerations for Minimizing Bias
PO-324 David Franklin Plotting Data by US ZIP Code
PO-407 Kena Patel
& Jingying Zhou
Tips and Tricks for using the CMS Platform
PO-418 Darren Jeng
& Sachin Heerah
RWD Exploration through R Shiny
PO-440 Oliver Lu
& Katie Watson
The SAS Genome - Genetic Sequencing
PO-451 Vandita Tripathi
& Manas Saha
Simplifying Edit Check Configuration


Advanced Programming

AP-102 : Creating Dated Archives Automatically with SAS®
Derek Morgan, Bristol Myers Squibb

When creating patient profiles, it can be useful for clinical scientists to compare current data with previous data in real time without having to request those data from an Information Technology (IT) source. This is a method for using SAS® to perform the archiving via a scheduled daily job. The primary advantage of SAS over an operating script is its date handling ability, removing many difficult calculations in favor of intervals and functions. This paper details an application that creates dated archive folders and copies SAS data sets into those dated archives, with automated aging and deletion of old data and folders. The application allows clinical scientists to customize their archive frequency (within certain limits.) It also keeps storage requirements to a minimum as defined by IT. This replaced a manual process that required study programmers to create the archives, eliminating the possibility of missed or incorrectly dated archives. The flexibility required for this project and the conditions under which it ran required using SAS date and time intervals and their functions. SAS was used to manipulate the files and directories.

AP-108 : Macro Variable Arrays Made Easy with macroArray SAS package
Bart Jablonski, yabwon

A macro variable array is a jargon term for a list of macro variables with a common prefix and numerical suffixes. Macro arrays are valued by advanced SAS programmers and often used as "driving" lists, allowing sequential metadata for complex or iterative programs. Use of macro arrays requires advanced macro programming techniques based on indirect reference (aka, using multiple ampersands &&), which may intimidate less experienced programmers. The aim of the paper is to introduce the macroArray SAS package. The package facilitates a solution that makes creation and work with macro arrays much easier. It also provides a "DATA-step-arrays-like" interface that allows use of macro arrays without complications that arise from indirect referencing. Also, the concept of a macro dictionary is presented, and all concepts are demonstrated through use cases and examples.

AP-135 : LAST CALL to Get Tipsy with SAS®: Tips for Using CALL Subroutines
Lisa Mendez, Catalyst Clinical Research
Richann Watson, DataRich Consulting

This paper provides an overview of six SAS CALL subroutines that are frequently used by SAS® programmers but are less well-known than SAS functions. The six CALL subroutines are CALL MISSING, CALL SYMPUTX, CALL SCAN, CALL SORTC/SORTN, CALL PRXCHANGE, and CALL EXECUTE. Instead of using multiple IF-THEN statements, the CALL MISSING subroutine can be used to quickly set multiple variables of various data types to missing. CALL SYMPUTX creates a macro variable that is either local or global in scope. CALL SCAN looks for the nth word in a string. CALL SORTC/SORTN is used to sort a list of values within a variable. CALL PRXCHANGE can redact text, and CALL EXECUTE lets SAS write your code based on the data. This paper will explain how those six CALL subroutines work in practice and how they can be used to improve your SAS programming skills.

AP-138 : An Introduction to the SAS Transpose Procedure and its Options
Timothy Harrington, Navitas Data Sciences

PROC TRANSPOSE is a SAS(r) procedure for arranging the contents of a dataset column from a vertical to a horizontal layout based on selected BY variables. This procedure is particularly useful for efficiently manipulating clinical trials data with a large number of observations and groupings as is often found in laboratory analysis or vital signs data. The use of PROC TRANSPOSE is illustrated with examples showing different modes of arranging the data. Possible problems which can occur when using this procedure, and their solutions are also discussed.

AP-144 : SAS® Super Duo: The Program Data Vector and Data Step Debugger
Charu Shankar, SAS Institute

Whether you are a self-taught SAS learner with a lot of experience, or a novice just entering the SAS universe, you may not have spent a lot of time delving into two fantastic SAS® superpowers. The Program Data Vector (PDV) is where SAS processes one observation at a time, in memory. The Data Step Debugger is an excellent tool to actually see the observation being held in memory and watch the movement of data from input to memory to output. Combining these two tools supplies SAS practitioners a lot of utility to "get under the hood" of how SAS code works in practice to ingest and analyze data during program operations. Once you know the specifics of what happens during compile time / execution, joins, and creating arrays, efficient SAS code will be at your fingertips. Action packed with animations, live demos and a great hands on section, this presentation will likely be a resource that you will use and reuse now and in the future

AP-175 : Tips for Completing Macros Prior to Sharing
Jeffrey Meyers, Regeneron Pharmaceuticals

SAS macros are a programmer's best friend when written well, and their worst nightmare when not. Macros are a powerful tool within SAS for automating complicated analyses or completing repetitive tasks. The next step after building a capable tool is to share it with others. The creator of the macro does not have much time to catch the attention of the user. The user encountering multiple errors, no documentation or guides, and lack of intuitive features pushes the user away from the macro. This paper will focus on completing a macro to give the user the best possible experience prior to sharing.

AP-191 : Comprehensive Evaluation of Large Language Models (LLMs) Such as ChatGPT in Biostatistics and Statistical Programming
Songgu Xie, Regeneron Pharmaceuticals
Michael Pannucci, Arcsine Analytics
Weiming Du, Alnylam Pharmaceuticals
Huibo Xu, Greenwich High School
Toshio Kimura, Arcsine Analytics

Generative artificial intelligence using large language models (LLMs) such as ChatGPT is an emerging trend. However, discussions using LLMs in biostatistics and statistical programming have been somewhat limited. This paper provides a comprehensive evaluation of major LLMs (ChatGPT, Bing AI, Google BARD, Anthropic Claude 2) in their utility within biostatistics and statistical programming (SAS and R). We tested major LLMs across several challenges: 1) Conceptual Knowledge, 2) Code Generation, 3) Error Catching/Correcting, 4) Code Explanation, and 5) Programming Language Translation. Within each challenge, we asked easy, medium and advanced difficulty level questions related to three topics: Data, Statistical Analysis, and Display Generation. After providing the same prompts to each LLM, responses were captured and evaluated. For some prompts, LLMs provided incorrect responses, also known as "hallucinations." Although LLMs replacing biostatisticians and statistical programmers may be overhyped, there are nevertheless use cases where LLMs are helpful in assisting statistical programmers.

AP-195 : A Simple Way to Make Adaptive Pages in Listings and Tables
Yi Guo, Pfizer Inc.

Generating listings and tables is an essential skill for every statistical programmer. The task of optimizing the display of listings and tables can be a challenging one for new programmers. This is because the true length of a variable can vary significantly between patient records, consequently affecting the number of pages in the listing, file size, loading time, and empty space on a page. Similarly, in tables, the number of distinct values of a categorical variable can vary as a study matures, and summarizing such categorical variables without affecting the page break can also be challenging - ideally, we would want to summarize a categorical variable on the same page when space allows. In this paper, we provide a simple algorithm that calculates the maximum number of observations to display on each page with an aim to optimize the display, the number of pages, the file size, and the loading time.

AP-212 : R Shiny and SAS Integration: Execute SAS Procs from Shiny Application
Samiul Haque, SAS Institute
Jim Box, SAS Institute

The integration of different programming languages and tools is pivotal for translational data science. R Shiny is the most popular tool for building web applications in R. However, biostatisticians and data scientists often prefer to leverage SAS Procs or macros for clinical decision making. The world of R Shiny and SAS does not need to be decoupled. R Shiny applications can incorporate SAS procs and analytics. In this work, we present mechanisms for integrating R Shiny and SAS. We demonstrate how SAS Procs and macros can be executed from R Shiny front end and SAS logs and results can be printed within Shiny App.

AP-218 : Potentials and Caveats When Using ChatGPT for Enhanced SAS Macro Writing
Xinran Luo, Everest Clinical Research
Weijie Yang, Everest Clinical Research

AI language like ChatGPT has impressed and even intimidated programmers. There are discussions of ChatGPT with examples of simple SAS steps and there are descriptions of various usages of ChatGPT without examples, but few papers discuss the use of ChatGPT in SAS macro development with examples. This paper explores the utility of ChatGPT in enhancing the process of writing SAS macros from scratch, using an example of checking SAS log in batch on Windows, and comparing the process of using conventional search engines. The focus is not only on utilizing ChatGPT's capabilities to provide programmers with initial ideas of program structure when they encounter unusual work requests, but also on demonstrating its application in developing a robust macro by showing key steps of the conversations between programmers and ChatGPT. Although ChatGPT proves invaluable in offering insights and suggestions, it's imperative to acknowledge certain caveats. Not all responses provided by ChatGPT are infallible, especially in the context of technical domains like SAS programming. Emphasizing the importance of independent verification, this paper underscores the need for users, especially new learners of SAS, to scrutinize and validate the suggestions before implementation. This paper aims to empower SAS practitioners by showcasing how ChatGPT can complement their macro-writing endeavors. By highlighting both the potentials and limitations of leveraging AI language models like ChatGPT, this paper contributes to fostering a balanced and discerning approach towards utilizing AI-driven assistance in SAS programming and macro development.

AP-229 : Create a Shift Summary of Laboratory Values in CTCAE Grade to the Worst Grade Abnormal Value using R and SASSY System
Vicky Yuan, Incyte Coperation

Shift summary of laboratory values in CTCAE grade to the worst grade abnormal value is often required for most laboratory data analysis and submission. The purpose of CTCAE grade shift table is to presents how the results are varying from the baseline to post-base visits in the study. This paper will illustrate how to report a shift table using R and packages from the SASSY system. It will start from an example and explain the anatomy, then a step-wise explanation of how to report the table in .doc file. The example is interesting because it contains "internal" footnotes that can change on every page. The R product used in this paper is R SASSY package version 1.2.0 running on RStudio environment

AP-237 : Getting a Handle on All of Your SAS® 9.4 Usage
Stephen Sloan, Dawson D R

SAS is popular, versatile, and easy to use, so it proliferates rapidly through an organization. It handles systems integration, data movement, and advanced statistics and AI, has links to a large amount of file types (Oracle, Excel, text, and others), and can meet almost every need. In addition to SAS Base, there are a large number of specialized and special-purpose SAS products, some of the most popular of which are SAS STAT, SAS OR, SAS Graph, Enterprise Miner, and Forecast Server. SAS EG allows for quick creation of useful artifacts and then facilitates saving the generated code for later use. As a result, it is difficult to track all of the SAS programs and artifacts being used across an organization, and economies of scale can be overlooked and repetition and "reinventing the wheel" sometimes take place. Programs and macros developed in one area can be useful in other areas and, as I have found when I share my programs, the programs and macros can be improved by this internal crowd-sourcing. Understanding everywhere SAS is used is important when upgrading a system that makes heavy use of SAS, or when upgrading SAS itself to a new version like Viya. It also helps an organization identify which SAS products it is using and how much use these products are getting. To accomplish the above, we've developed a set of programs to search a Unix server or a Windows server or machine to find, catalog, and identify the SAS usage on the machine.

AP-252 : Externally Yours - Adeptly Managing Data Outside Your EDC System
Frank Canale, SoftwaRx, LLC

Programmers in the pharmaceutical industry are used to working with data that is entered into, and extracted from, a system commonly known as an EDC (Electronic Data Capture) system. When using data that is sourced from one of these systems, you can reliably count on the type of data you'll receive (normally SAS datasets), and if the EDC is set up well, a standard structure that provides output data containing CDISC/CDASH variable names. But what does one do when receiving data that is sourced outside the EDC system and received from other vendors? How do you manage this data- retrieve it- validate the structure- even export it to a format allowing you to merge it with other more conventional SAS datasets?

AP-253 : Build Your Own PDF Generator: A Practical Demonstration of Free and Open-Source Tools
James Austrow, Cleveland Clinic

The PDF is one of the most ubiquitous file formats and can be read on nearly every computing platform. So how, in the year 2024, can it still be so inconvenient to perform basic editing tasks such as concatenating and merging files, inserting page numbers, and creating bookmarks? These features are often locked behind paid licenses in proprietary software or require that the documents be uploaded to a web server, the latter of which poses unacceptable security risks. In fact, the PDF is a public standard and there exist free, open-source libraries that make it easy to build in-house solutions for these and many other common use cases. In this paper, we demonstrate how to use Python to assemble and customize PDF documents into a final, polished deliverable. We will also lay the foundation for automating these tasks, which can save countless hours on reports that have to be prepared on a regular basis.

AP-256 : Leveraging ChatGPT in Statistical Programming in the Pharmaceutical Industry
Ian Sturdy, Eli Lilly and Company

This paper explores the potential benefits of incorporating ChatGPT, a state-of-the-art natural language processing model, in statistical programming within the pharmaceutical industry. By leveraging ChatGPT's capabilities, this technology can save time, money, and most importantly, your sanity. Programming often leads to frustration, anxiety, and sleepless nights trying to solve complex problems. Various practical applications and techniques that harness the power of ChatGPT will be described to reduce all of these. In a world where Artificial Intelligence threatens to take our jobs, this paper suggests methods of tapping into the untapped potential of ChatGPT to empower programmers with innovative tools, thereby increasing our value. When programming issues arise, no longer will you need to worry about judgement or hostility from others on online forums. ChatGPT is a powerful tool we have yet to fully leverage, and its benefits extend well beyond our imaginations, let alone this paper.

AP-268 : A New Approach to Automating the Creation of the Subject Visits (SV) Domain
Xiangchen Cui, Crisprtx Therapeutics
Jessie Wang, CRISPR Therapeutics
Min Chen, CRISPR Therapeutics

The creation of the subject visits (SV) domain is one of the most challenging tasks of SDTM programming. Aside from the small portion of mapping from raw dataset variables to SV variables, SV programming mainly consists of a more complex derivation process, which is totally different from that of other SDTM domains. The dynamic parts of the SV programming process, such as identifying raw datasets and their variables with both date/time and clinical visits, cause manual development of a SAS program to be time-consuming and error prone. Hence, automating its code generation would achieve and enhance efficiency and accuracy. This paper will present a new approach for SV automation based on the SDTM automation done in our previous paper, which leveraged CRF specifications from an EDC database and SDTM standards [1]. It will introduce the standard SV programming logic flow with 10 sequential steps, which leads us to develop an additional SAS-based macro named %SV_Code_Generator as an expansion to the macro introduced in [1]. The output of this macro ( achieves 100% automation of SV domain for the raw data collected per CRFs in a clinical study. This new approach guarantees all raw dataset variables related to subject visits are accounted for in SV programming thanks to the sequential programming automations. This automation allows for the generation of SV dataset to occur very early in the programming development cycle and makes developing programmatic quality checks for clinical data review and data cleaning more efficient and economically feasible.

AP-289 : Programming with SAS PROC DS2: Experience with SDTM/ADaM
Li Cheng, Vertex Pharmaceuticals Inc.

PROC DS2 is a procedure introduced with SAS Base 9.4. This procedure provides opportunities for SAS programmers to apply Object Oriented Programming (OOP) and multithread techniques in SAS programming and is a critical connection between the - traditional' SAS programming and programming in SAS Viya platform. The goal of this paper is to pilot the use of PROC DS2 in the work of preparing clinical trial CDISC datasets. In this paper, PROC DS2 is tested in the programming of SDTM/ADaM on a server with SAS Base 9.4 M3 release. After converting SDTM/ADaM programs written in - traditional' SAS programming language into the PROC DS2 code, this paper presents the lessons learned and the notes taken when the obstacles are overcome or bypassed. Furthermore, OOP and multithread techniques are explored to apply into the programming for SDTM/ADaM. Programming setups with a standard folder structure are discussed and the performance of using OOP and multithread techniques are also evaluated.

AP-295 : Replicating SAS® Procedures in R with the PROCS Package
David Bosak,

The "procs" package aims to simulate some commonly used SAS® procedures in R. The purpose of simulating SAS procedures is to make R easier to use and match statistical results. Another important motivation is to provide stable tools to work with in the pharmaceutical industry. The package replicates several of the most frequently used procedures, such as PROC FREQ, PROC MEANS, PROC TTEST, and PROC REG. The package also contains some data manipulation procedures like PROC TRANSPOSE and PROC SORT. This paper will present an overview of the package and provide demonstrations for each function.

AP-298 : Comparison of Techniques in Merging Longitudinal Datasets with Errors on Date Variable: Fuzzy Matching versus Clustering Analysis
Huitong Niu, Master of Science Student, Biostatistics, Fielding School of Public Health, University of California, Los Angeles
Yan Wang, Adjunct Assistant Professor, Public and Population Health, School of Dentistry, University of California, Los Angeles

This paper examines effective techniques for merging longitudinal datasets with key variable inaccuracies, focusing on date errors. Traditional SAS methods, like the DATA Step MERGE or PROC SQL JOIN, require exact matches on key variables, which is challenging in datasets containing errors. Our paper compares fuzzy matching and clustering analysis within SAS, assessing their effectiveness in reconciling datasets with inconsistencies in date variables. We simulate a longitudinal dataset of approximately 2,000 observations, representing about 500 patients with repeated measurements. The dataset is used to simulate two datasets including normally (or uniformly) distributed errors on date, manually introduced errors (e.g., typing "12" as "21"), and missing date information (e.g., entering "06/23" instead of "12/06/2023"). For each scenario, we use fuzzy matching and clustering analysis to merge two datasets, evaluating the accuracy of each technique. Preliminary results show varied effectiveness depending on the type of error on the date variable. For datasets with normally (or uniformly) distributed errors on date, clustering analysis significantly outperforms fuzzy matching with a 94.9% accuracy rate compared to 54.1%. In the case of manually introduced errors, both methods achieve high accuracy, around 98%. However, for datasets with missing date information, fuzzy matching is more effective, attaining an 84.4% accuracy rate as opposed to 45.2% for clustering analysis. The paper concludes with a discussion of these findings, offering insights for researchers on selecting appropriate methods for merging datasets with errors on date.

AP-349 : Just Stringing Along: FIND Your Way to Great User-Defined Functions
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

SAS® provides a vast number of functions and subroutines (sometimes referred to as CALL routines). These useful scripts are an integral part of the programmer's toolbox, regardless of the programming language. Sometimes, however, pre-written functions are not a perfect match for what needs to be done, or for the platform that required work is being performed upon. Luckily, SAS has provided a solution in the form of the FCMP procedure, which allows SAS practitioners to design and execute User-Defined Functions (UDFs). This paper presents two case studies for which the character or string functions SAS provides were insufficient for work requirements and goals and demonstrate the design process for custom functions and how to achieve the desired results.

AP-361 : Efficient Repetitive Task Handling in SAS Programming Through Macro Loops
Chary Akmyradov, Arkansas Children's Research Institute

This paper delves into the optimization of repetitive tasks in SAS programming, a common challenge faced by data analysts and programmers. The primary focus is on harnessing the power of SAS macro programming techniques, specifically through the implementation of do loops within macros. Initially, the paper introduces the basics of SAS macros, outlining their significance in automating repetitive sequences of code, and providing a foundational understanding of macro variables and syntax. The discussion then progresses to the implementation of simple do loops within macros, highlighting their practicality in routine data manipulation tasks. Through a series of practical examples and use-case scenarios, the paper demonstrates the effectiveness of these loops in real-world applications. Addressing the limitations of these simple implementations, the paper further explores the generalization of do loops, presenting advanced methods to create dynamic, parameter-driven macros capable of handling a variety of tasks and parameters. This advanced approach is exemplified through complex scenarios and case studies, showcasing the adaptability and efficiency of generalized do loops in diverse data analysis contexts. By the conclusion, the paper provides a comprehensive insight into the role of macro programming in SAS, offering a valuable resource for SAS programmers seeking to streamline their coding workflow and enhance efficiency in data processing tasks. This work not only serves as a practical guide for current SAS users but also contributes to the broader conversation on the future of macro programming in data analysis.

AP-411 : Running the CDISC Open Rules Engine (CORE) in BASE SAS
Lex Jansen, CDISC

CDISC Conformance Rules are an integral part of the Foundational Standards and serve as the specific guidance to Industry for the correct implementation of the Standards in clinical studies. The overall goal of the CORE Initiative is to provide a governed set of unambiguous and executable Conformance Rules for each Foundational Standard, and to provide an open-source execution engine for the executable Rules which are available from the CDISC Library. The source code of the CORE engine is available on the GitHub repository. A CLI (Command Line Interface) is available on the repository which allows users to run the rules under Windows, Mac, and Linux. If users want to run the Engine in their own Python environment or tooling, it can be implemented as it is available on PyPi (Python Package Index). For SAS users it is not always an option to run applications as a Command Line Interface. The presentation will begin with a brief overview of the CDISC CORE concept. The CORE Engine will then be covered. Then the presentation will describe a proof of concept where the CDISC CORE CLI commands have been implemented into SAS processes as Python functions in PROC FCMP, passing parameters and code to the Python interpreter and returning the results to SAS. These Python functions can be called and executed by user-defined SAS functions, which can be called from the DATA step or any context where SAS functions are available.

AP-420 : Generation of Synthetic Data for Clinical Trials in Base SAS using a 2-Phase Discrete-Time Markov and Poison Rare Event Framework
Adam Yates, Data Coordinating and Analysis Center (DCAC), HJF-MHRP
Misti Paudel, Brigham and Women's Hospital Division of Rheumatology, Inflammation, and Immunity, Harvard School of Medicine
Fengming Hu, Data Coordinating and Analysis Center (DCAC), HJF-MHRP

Synthetic data for clinical trials independent of human participants has growing utility in clinical and epidemiologic fields, but a persistent concern has been the viability and reliability of producing synthetic data which conforms to the complex nature of biomedical data. Recent successes in synthetic clinical trial data include the use of Synthetic Control Arm (SCA) applications, but the generation of treatment-related data necessarily faces additional scrutiny. While synthetic data cannot replace trail data for scientific discovery, planning and development phases of clinical trials can benefit from the use of synthetic treatment and control data. This paper describes a novel program developed in base SAS which generates synthetic data that was used in clinical trial development, design, and report programming. We developed a stochastically grounded process which generates synthetic data of population-specific enrollment characteristics, as well as longitudinal local and systematic reactogenicity, unsolicited events, and adverse events. We implement a discrete-time Markov process framework to generate longitudinal observation time, incorporating a Poisson-based probability of events within each state. This 2-phase stochastic generation process results in across observation time which conforms to biologically natural and realistic behaviors. Key to our process is that reaction frequency may be modulated based on expert experience or historical expectations, but the generated data do not rely directly on existing clinical data. Potential applications and extensions in a machine learning context will be discussed. This paper is intended for individuals with an interest in clinical trial data and a basic to intermediate command of SAS Macro processing.

AP-424 : Adding the missing audit trail to R
Magnus Mengelbier, Limelogic AB

The R language is used more extensively across the Life Science industry for GxP workloads. The basic architecture of R makes it near impossible to add a generic audit trail method and mechanism. Different strategies have been developed to provide some level of auditing, from logging conventions to file system audit utilities, but each has its drawbacks and lessons learned. The ultimate goal is to provide an immutable audit trail compliant with ICH Good Clinical Practice, FDA 21 CFR Part 11 and EU Annex 11, regardless of the R environment. We consider different approaches to implement auditing functionality with R and how we can incorporate an audit trail functionality natively in R or with existing and available external tools and utilities that completely supports Life Science best practices, processes and standard procedures for analysis and reporting. We also briefly consider how the same principles can be extended to other languages such as Python, SAS, Java, etc.

Data Standards

DS-109 : Analyzing your SAS log with user defined rules using an app or macro.
Philip Mason, Wood Street Consultants

SAS provide some pretty basic help with logs that are produced, typically just linking to errors and warnings. Many people build log checkers to look for particular things of interest in their logs, which usually involves capturing the log and then running some SAS code against it. I made a way to define rules in JSON format which can be read by a SAS macro and used to look for things in a log. This means different rules can be used for different use cases. They can be used via a macro or via a web application I build. The web app can switch between rules, provides summaries, draws diagrams of the code, provides performance stats, and more. Hopefully this functionality might one day be built into SAS, but in the meantime it works well as an addition.

DS-130 : SDTM Specifications and Datasets Review Tips
Wanchian Chen, AstraZeneca

SDTM requirements are spread across various sources such as SDTM Implementation Guide (SDTMIG) domain specifications section, SDTMIG domain assumptions section, and FDA Study Data Technical Conformance Guide. While Pinnacle 21 can assist in identifying issues with SDTM data, it is important to note that data is often limited at the early stages of a study. The most efficient process would be to review SDTM specifications before the creation of SDTM programs, to minimize program modifications and save time. Programmers often seek guidance on conducting a comprehensive review of SDTM but unsure where to start. In this presentation, I will provide a concise summary of frequently seen, domain specific as well as general, findings observed in multiple studies when reviewing SDTM. I will show which issues can be seen in the Pinnacle 21 report and which ones are missed. I will also cover situations where variables are not applicable to your study, but still may pass Pinnacle 21 checks. This presentation is designed to benefit programmers involved in SDTM review process.

DS-150 : Assurance in the Digital Age: Automating MD5 Verification for uploading data into a Cloud based Clinical Repository
Laura Elliott, SAS Institute Inc.
Ben Bocchicchio, SAS

Utilization of a cloud-based repository has become increasingly more common with large clinical trials. Verifying the integrity of data moved into the cloud for clinical trials is of utmost importance. Normally, this process requires manual intervention to verify the local source data matched the data stored in the cloud-based system. This paper discusses a process that will automate the creation of a verification report comparing md5 checksums from source to destination. The process, written in python, generates a .csv file of checksums from the source data, then uses an input file containing the folder paths to be uploaded to the cloud via REST APIs to migrate the data. The source md5 checksums are also uploaded. The python code then calls the REST APIs to execute a script in the cloud which compared the source and destination md5s using SAS code. The result of the process is a .pdf report that summarizes the comparison of the source and destination md5 checksums. This process offers a completely automated way to prove data integrity for migration of local source data into a cloud-based clinical repository.

DS-154 : Exploit the Window of Opportunity: Exploring the Use of Analysis Windowing Variables
Richann Watson, DataRich Consulting
Elizabeth Dennis, EMB Statistical Solutions, LLC
Karl Miller, IQVIA

For analysis purposes, dataset records are often assigned to an analysis timepoint window rather than simply using the visits or timepoints from the collected data. The rules for analysis timepoint windows are usually defined in the Statistical Analysis Plan (SAP) and can involve complicated derivations to determine which record(s) best fulfils the analysis window requirements. For traceability, there are ADaM standard variables available to help explain how records are assigned to the analysis windows. This paper will explore these ADaM variables and provide examples on how they may be applied.

DS-188 : Automated Harmonization: Unifying ADaM Generation and Define.xml through ADaM Specifications
Wei Shao, Bristol Myers Squibb
Xiaohan Zou, Bristol Myers Squibb

In electronic submission packages, ADaM datasets and Define.xml stand as pivotal components. Ensuring consistency between these elements is critical. However, despite this importance, the current method still heavily depends on manual checks. To address this challenge, we introduce an innovative automated approach driven by ADaM specifications. Our solution involves a suite of SAS® macros engineered to streamline the translation from ADaM specification to both ADaM datasets and Define.xml. These macros orchestrate a seamless automation process, facilitating the generation of ADaM datasets while concurrently fortifying consistency between ADaM datasets and Define.xml. The automated processes include format creation, core variable addition, variable attributes generation, dynamic length adjustment based on actual values, and automatic ADaM specification updates from actual data. These macros act as dynamic tools, constructing datasets with precision, adjusting variable attributes, and most importantly, syncing Define.xml with actual data. Our automated tool system not only expedites ADaM datasets creation but also ensures an inherent consistency with Define.xml. This amalgamation of automation and specification-based integrity significantly reduces manual errors, enhances data quality, and fortifies the efficiency of the submission process.

DS-193 : Around the Data DOSE-y Doe, How Much Fun Can Your Data Can Be: Using DOSExx Variables within ADaM Datasets
Inka Leprince, PharmaStat, LLC
Richann Watson, DataRich Consulting

In the intricate dance of clinical trials that involve multiple treatment groups and varying dose levels, subjects pirouette through planned treatments - each step assigned with precision. Yet, in the realms of pediatric, oncology, and diabetic trials, the challenge arises when planned doses twirl in the delicate arms of weight adjustments. How can data analysts choreograph Analysis Data Model (ADaM) datasets to capture these nuanced doses? There is a yearning to continue with the normal dance routine of analyzing subjects based on their protocol-specified treatments, yet at times it is necessary to learn a new dance step, so as not to overlook the weight-adjusted doses the subjects actually received. The treatment variables TRTxxP/N in the Subject-Level Analysis Dataset (ADSL) and their partners TRTP/N in Basic Data Structure (BDS) and Occurrence Data Structure (OCCDS) are elegantly designed to ensure each treatment glides into its designated column in the summary tables. But we also need to preserve the weight-adjusted dose level on a subject- and record-level basis. DOSExxP and DOSExxA, gracefully twirl in the ADSL arena, while their counterparts, the dashing DOSEP and DOSEA, lead the waltz in the BDS and OCCDS datasets. Together, these harmonious variables pirouette across the ADaM datasets, capturing the very essence of the weight-adjusted doses in a dance that seamlessly unfolds.

DS-204 : ADaM Discussion Topics: PARQUAL, ADPL, Nadir
Sandra Minjoe, ICON PLC

This paper and presentation will cover three topics that have been under varying levels of discussion within the CDISC ADaM team but are not part of the standard. First is the parameter-qualifier variable PARQUAL, which can be found in a couple Therapeutic Area User Guides (TAUGs), went out for public review as part of ADaMIG v1.2, but currently breaks BDS rules because it never made it into a final publication. Second is ADPL, a one-record-per-subject-per-participation dataset that might be useful for studies where subjects can enroll more than once or have multiple screening attempts, similar to the proposed SDTM DC domain. Third is Nadir variables, like Change from Nadir and Percent Change from Nadir, not currently allowed in a BDS structure. In each case, the paper and presentation will summarize CDISC ADaM team discussions and give personal (not CDISC-authorized) recommendations of when and how to implement these concepts in order to meet analysis needs.

DS-205 : A New Way to Automate Data Validation with Pinnacle 21 Enterprise CLI in LSAF
Crystal Cheng, SAS

Pinnacle 21 Enterprise is a software provides checks on the data compliance with CDISC standards, control terminology and dictionaries when users preparing clinical data submission to regulatory agencies. By validating clinical data early and frequently during the conduction of the clinical trial, it helps user to discover data issues and address data issues in advanced, ensuring the quality of submission data. There are different ways to execute validations in P21 Enterprise. Users can either manually run the validation via user interface of P21 or, for a more automated process, execute a process flow in SAS life Sciences Analytics Framework (LSAF) to invoke the Enterprise Command Line Interface(ECLI) from P21. Integrating LSAF with P21 and setting up the validation process via a process flow is time-saving for programmers and less prone to errors during packaging and uploading datasets for P21 validation. This paper will focus on the detailed steps to set up the automated process flow of the Pinnacle 21 Validation in SAS Life Science Analytics Framework (LSAF) and explore the benefits of automating the validation process.

DS-214 : Data Quality Framework application for Data management using R Shiny
Sujana Katta, Tech Observer UK LTD
Srinivas Macherla, Tech Observer UK Ltd

Clinical data quality framework is the primary and a very pivotal step for generating the Data insights and Decision making. The burden and even the responsibility of maintaining data quality is up on data managers from start to end of the study. And a very strategic and stringent set of actions like data review, track issues, communication/resolution and documentation are to be carried out throughout the study. This has motivated us to develop an R Shiny application to support Data management team and SDTM Mappers, that can be deployed by end user. The R shiny app will be a one stop solution for regular edit checks, user defined checks, CDISC compliance checks, and visualization for outliers where the input would be the raw data. Usually edit checks programming and SDTM custom defined checks would determine the data quality during Data integrity activities of data extraction phase. This app helps us with additional features to effectively monitor database cleaning and enhances SDTM mapping quality which would minimize the time for custom checks in SDTM mapping.

DS-271 : Programming Considerations in Deriving Progression-Free Survival on Next-Line Therapy (PFS2)
Alec McConnell, BMS
Yun Peng

Historically, oncology clinical trials have relied on Overall Survival (OS) and Progression Free Survival (PFS) as primary efficacy endpoints. While OS is often the most desired estimate, it requires many years of follow-up to derive an unbiased estimate from the study. Additionally, even with follow-up, OS estimates are subject to confounding due to subsequent therapies which are commonplace in the treatment of cancer. As a proxy for OS, the EMA has recommended the evaluation of Progression Free Survival 2 (PFS2). According to the EMA, "PFS2 is defined as the time from randomization (or registration, in non-randomized trials) to second objective disease progression, or death from any cause, whichever first." In spite of this definition, PFS2 requires complex data collection and derivation. Within our oncology team at Bristol-Myers Squibb (BMS), different studies approach the derivation differently. In this paper, we will share how our team at BMS collects the relevant data to derive the PFS2 endpoint with a consistent approach in both the advanced and early settings. Furthermore, we will explain how we structure our ADAM datasets to assist in our derivation of the endpoint.

DS-274 : Guidance Beyond the SDTM Implementation Guide
Kristin Kelly, Pinnacle 21 by Certara
Michael Beers, Pinnacle 21

A common misconception among preparers of SDTM data seems to be that it is sufficient to just follow the SDTM Implementation Guide when creating the datasets. The truth is that it is more complicated than that. A preparer of SDTM datasets needs to be aware of all the industry guidance available when preparing for regulatory submission, from CDISC and the regulatory agencies, but also from other organizations as well. This presentation will discuss some of the lesser-known guidance in the industry and why they should be referenced, as well as some of the impacts of not using these documents in the creation of SDTM datasets.

DS-276 : Your Guide to Successfully Upversioning CDISC Standards
Soumya Rajesh, CSG Llc. - an IQVIA Business

As of 2023, newer versions of the CDISC standards (i.e., SDTM v2.0, SDTMIG v3.4, SDTM v1.7, SDTM IG v3.3, and Define.xml v2.1) are either required or supported by the industry's regulatory agencies. This paper relays challenges and best practices the authors have experienced while up-versioning to these standards. Not all these practices are found in published standards. This paper will bring together the resources and lessons learned in one place, so that readers can skillfully navigate through the challenges of adopting these new standards. Highlights include strategies for dealing with value level metadata for variables with multiple codelist references, a new domain class, new domains, and domains referenced in TAUGs not seen in the IGs. We'll discuss best practices for data modeling: when to use new variables, supplemental qualifiers, and targeting the appropriate domains. We'll include experiences interpreting and dispositioning validation output from the applicable conformance rules.

DS-280 : I Want to Break Free: CRF Standardization Unleashing Automation
Laura Fazio, Formation Bio
Andrew Burd, Formation Bio
Emily Murphy, Formation Bio
Melanie Hullings, Formation Bio

Achieving efficient and impactful Case Report Form (CRF) standardization in the pharmaceutical industry demands intense cross-functional collaboration and a shared understanding of the benefits. This foundation is crucial for improved data quality as well as downstream analysis and reporting automation. Deviations from standards cause manual review, increased errors, and added inefficiencies in downstream code development. To address these challenges, an internal Standards Committee led by Data Management and Systems Analytics teams was formed to gain diverse cross-functional alignment through a comprehensive charter. The charter mandates that study teams adhere to standards during study startup, with deviations requiring justification and approval from the Committee. While CRF standards are typically developed by Medical and Clinical teams, we additionally include roles with a focus on downstream analysis and reporting including our Data Science, Statistical Programming, and Clinical Analytics teams. This paper advocates for an inclusive approach to standards development, emphasizing that resulting datasets should be versatile for all downstream purposes. Such an approach unlocks the power of automation, minimizes reactivity, and fosters efficiency and continuity across clinical studies.

DS-287 : ADaM Design for Prostate Cancer Efficacy Endpoints Based on PCWG3
Lihui Deng, Bristol Myers Squibb
Kylie Fan, BMS
Jia Li, BMS

Unlike other types of solid tumors that use the RECIST 1.1 tumor response criteria, due to the particularity of prostate cancer, some common oncology efficacy endpoints, such as rPFS, ORR, time to response, and duration of response are usually based on the PCWG3 criteria. Additionally, other specific prostate cancer endpoints like PSA response rate and time to PSA progression are also based on PCWG3, involving more complex data collection and derivation than RECIST 1.1. In this paper, we will share efficacy endpoints in prostate cancer, such as PSA response and time to PSA progression. We will explain the ADaM design and data flow, and how to ensure traceability and data dependency in derivation. We successfully implemented programming for these complex endpoints, enhancing the speed and quality of effective analysis through the development of macros.

DS-305 : Guideline for Creating Unique Subject Identifier in Pooled studies for SDTM
Vibhavari Honrao, NMIMS University, Mumbai

Demographic Dataset is the parent dataset which includes set of essential standard variables that describe each subject in a clinical study. One of these key variables is Unique Subject Identifier (USUBJID). SDTM IG does not provide any guidance on creation of USUBJID for pooled studies. Hence it becomes necessary to understand programming steps involved for statistical programmers. In clinical trials, there are cases wherein subjects are re-enrolled for different studies for a same compound, and it can be difficult to identify the subject while maintaining CDISC compliance. For ISS analysis, pooling of studies becomes challenging due to multiple SUBJID, RFICDTC, RFSTDTC, RFENDTC etc. within same USUBJID from different studies. This paper demonstrates various steps and programming logics involved to develop Demographic Dataset by taking hypothetical examples from multiple studies and creates pooled datasets.

DS-310 : Converting FHIR to CDASH using SAS
Pritesh Desai, sas
Mary Liang, SAS

With the growing diversity of standards for collecting and presenting Real World Evidence (RWE), there is an escalating demand for the conversion of these standards into more actionable datasets. This paper demonstrates the transformation from FHIR (Fast Healthcare Interoperability Resource) to CDASH using various methods within SAS Viya. The outlined methods are easily adaptable to other standards or datasets initially presented in JSON format. Moreover, recognizing the need for accessible processes, we will highlight the creation of low/no code procedures to enhance access to these updated datasets, including the transformation of conversion work into SAS Viya Custom Steps.

DS-342 : CDISC Therapeutic Area User Guides and ADaM Standards Guidance
Karin LaPann, CDISC independent contractor

One of the frequently overlooked yet immensely valuable resources for implementing standards are the CDISC Therapeutic Area User Guides (TAUGs). Presently the CDISC website hosts 49 of these guides, 23 of which incorporate ADaM sections. These guides are created by groups of CDISC standards volunteers across the industry and include medical professionals and researchers with experience in the respective disease areas. The first few years of development, these TAUGs concentrated on the collection of the data and the implementation of the SDTM to contain it. In 2014 the first TAUG with an analysis section using ADaM was published. Many TAUGs are developed with additional implementation of the analysis datasets, with ADaM compliant examples. This provides a utility to the programming community to illustrate how the SDTM datasets are further arranged for analysis. The latest initiative has been to expand these TAUGs through grants by organizations representing various diseases. One of these is the recently released Rare Diseases Therapeutic Area User Guide, partially sponsored by a grant from the National Organization for Rare Disorders (NORD) This paper will describe the TAUGs developed with ADaM standards, highlighting their distinctions from prior versions. We will suggest how to use the TAUGs as a reference for conducting studies within various disease areas.

DS-353 : Protocol Amendments and EDC Updates: Downstream impact on Clinical Trial Data
Anbu Damodaran, Alexion Pharmaceuticals
Ram Gudavalli, Alexion Pharmaceuticals
Kumar Bhimavarapu, Alexion Pharmaceuticals

This paper investigates the impact of continuous database updates during ongoing studies, particularly emphasizing EDC migrations and Protocol amendments. Through examination of practical examples, it reveals the cascading effects on CDISC datasets, as well as the resulting modifications in reporting. Moreover, the paper scrutinizes the downstream impacts of subject transfers across studies or sites, uncovering intricacies related to re-screening subjects who initially did not meet inclusion/exclusion criteria. By unraveling the complexities of these processes, the paper offers valuable insights to improve data integrity and ensure compliance with regulatory guidelines in clinical research.

DS-360 : A quick guide to SDTM and ADaM mapping of liquid Oncology Endpoints.
Swaroop Kumar Koduri, Ephicacy Lifescience Analytics Pvt Ltd
Shashikant Kumar, Ephicacy Lifescience Analytics
Sathaiah Sanga, Ephicacy Lifescience Analytics

Cancer is a disease where some of the body's cells mutate, grow out of control, and spread to other body parts. The mutated cells possess the ability to infiltrate and destroy healthy body tissue all over the body. Liquid Tumors (Blood Cancer) are commonly occurring in bone marrow and the lymphatic system. In oncology clinical trials, response and progression is key to measuring survival and remission rates. In accordance with the response criteria guidelines, oncology studies are also divided into one of three subtypes. The first sub type, Solid Tumor study, usually follows RECIST (Response Evaluation Criteria in Solid Tumor) or irRECIST (immune-related RECIST). The second sub type, Lymphoma study, usually follows Cheson 1997 or 2007. Lastly, Leukemia studies follow study specific guidelines (e.g., IWCLL for Chronic Lymphocytic Leukemia). This paper will focus on the blood cancers (Lymphoma and Leukemia) also specifically show with examples SDTM and ADaM domains are used to collect the different data points in each type. This paper will show how standards are used to capture disease response and CDISC will streamline the development of clinical trial artifacts in liquid oncology studies.

DS-367 : Handling of Humoral and Cellular Immunogenicity Data in SDTM
Wei Duan, Moderna Therapeutics

The immunogenicity endpoints has been broadly used and examined in clinical vaccine studies as the key assessment of immune response to the viral infection. On one hand, humoral immunogenicity, including serum neutralizing and binding antibodies, has been migrated to IS domain per SDTMIG v3.4, as this domain is primarily used to map molecules targeting to antibody recognition, binding and testing. Along the way of migrating, we faced some challenges when piecing various data sources together, from vendor tested immunogenicity results, sample collection data in EDC, to central lab collected sample data, with some complexity in sorting out data flows, reconciliation of discrepancies and mapping out-of-bound values. On the other hand, there is a trend that clinical vaccine studies start drilling down and taking a deep dive into cellular response by using CMI (Cell Mediated Immunity) data. Cytokines produced by monocytes, T cells, and lymphokines, play important roles in regulating immune functions and developing antiviral immune functions. This paper will discuss the proper domains for both types of immunogenicity data, along with some mapping challenges into CDISC and submission compliant SDTM datasets per our experience.

DS-374 : Implementation of composite estimands for responder analysis based on change from baseline in non-solid tumours
Reshma Radhakrishnan, Anna University

ICH E9 R1 published "Estimands and Sensitivity Analysis in Clinical Trials." in August 2017. An estimand is a precise description of the treatment effect reflecting the clinical question posed by a given clinical trial objective. Description of estimands must be based on how intercurrent events affect clinical question of interest. Intercurrent events (ICE) are events occurring after treatment start and how it affects either the interpretation or measurement associated with clinical question of interest. Some of examples of ICEs are Death, switch to other treatment, treatment discontinuation etc. There are five strategies for addressing ICEs like Treatment policy strategy, Hypothetical strategies, Composite variable strategies, While on treatment strategies and Principal stratum strategies. After regulatory authorities endorsed it from 2020, Estimands gained more popularity and included in SAP. It would be helpful to understand estimands and, how efficiently one can construct datasets to support efficacy analysis to make sure correct implementation of ICEs. This paper aims to provide its audience the simplified way to design datasets, variables for composite estimands for responder analysis based on change from baseline in non-solid tumours. Also, how to carefully handle ICEs in missing data analysis using SAS procedures PROC STDRATE and PROC MI. The STDRATE procedure computes directly standardized rates and risks for study populations using Mantel-Haenszel estimates whereas PROC MI is used to replace missing values with multiple imputation.

DS-388 : Advancing the Maturation of Standardized CRF Design
Rubha Raghu, Algorics
Sugumaran Muthuraj, Algorics
Vijayakumar Radhakrishnan, Algorics
Nithiyanandhan Ananthakrishnan, Algorics

Effective data collection using Case Report Forms (CRFs) can contribute significantly to the success of a clinical trial. The evolution of standardized CRF design has been a crucial advancement in bettering clinical trial outcomes. The SDTM standardization process is tremendously time-consuming due to challenges like data inconsistencies, absent data, incomplete information, and sensitivity to case variations. Standardizing at the CRF level can mitigate the necessity for revisions and elevate the overall quality of data. Employing Artificial Intelligence (AI) to automate the field collection process in CRF can act as the much-needed fail-safe against inappropriate data collection, thereby enhancing accuracy and reliability. Artificial intelligence, machine learning, and advanced data visualization techniques play an integral role which will help with positive research and patient outcomes including improved data quality, efficiency, and regulatory compliance. In the paper, we will be discussing the maturation of standardized CRF design with automation tools, optimizing data collection methodologies, and maintaining a dynamic approach to CRF design standardization for enhanced efficiencies and data quality.

DS-398 : Streamlining Patient-reported outcome (PRO) data standardization & analysis
Varsha Mithun Patil, Ephicacy
Mrityunjay Kumar, Ephicacy Lifescience Analytics

Patient-reported outcome (PRO) measures is one of the Clinical outcome assessments (COA) measures with the aim to capture health-related quality of life from a patient's perspective and without the interpretation of caregivers in contemporary clinical trials. Today, the preferred mode for the collection of patients reported outcome (PRO) data in clinical research is electronic. This preference is largely driven by the factors like enhancements to data quality that ePRO data collection affords, real-time monitoring, less missing data, and the possibility for immediate interactions. These quality enhancements are compromised using inconsistent data structures and non-adherence to establish data standards. Currently, it's not mandatory for PRO data to follow a data standard, even though many pharma companies are adapting the CDISC standard & FDA guidelines for PRO data submissions however data models often vary by sponsors. This lack of consistency creates risks for programming, analysis and difficulties for analytics functions generating the required analysis and submission datasets. The intent of this paper is to provide available information/guidelines on PRO analysis mainly in Oncology & Neuroscience therapeutic areas. This paper primarily focuses on addressing these issues by discussing/suggesting the best practice like adopting CDISC standards at the source within the ePRO data platform. We will also discuss various types of PRO measures, available SDTM standards & ADaM structure based on objectives & endpoints for PRO data analysis.

DS-400 : AI and the Clinical Trial Validation Process - Paving a Rocky Road
Steve Ross, NA
Ilan Carmeli, NA

The validation of outputs in a clinical research environment acts as a guarantor process, confirming the accuracy and validity of the trial results, the investment of doctors, patients, and caregivers in the efficacy and utility of the trial, and the reputation of the sponsor and/or CRO conducting the validation. Double programming has done this heavy lifting for decades. The increasing application of AI (ML and NLP), gives statisticians and programmers unprecedented opportunity to apply this technology wherever validation takes place - " during the development cycle to aid teams in getting to the right output sooner, and at the end of a study to check that tables in a package match what is expected for the deliverable. This paper shows how to use AI and automation capabilities within Verify for activities including validating that the titles, footnotes, format, and content of the output matches the display in the mock shells; whether big N's, small n's, and counts in the body of a display are logical and accurate within and across tables. This paper also illustrates a documentary audit trail that captures the end-to-end decision-making process and feedback from contributors as tables are revised from early deliverables to final. Automating critical iterative tasks can free up validation time and brainpower for statisticians and programmers to focus on the bespoke aspects of clinical trials, such as primary efficacy endpoints, complex algorithms and analyses, and ensuring that the truth of the data is told - " paving the rocky road to the final product.

DS-406 : Game changer! The new CDISC ADaM domain ADNCA for PK/PD data analysis
Santosh Ranjan, IQVIA

Characterizing the Pharmacokinetic (PK) or Pharmacodynamic (PD) profile of novel drugs is an integral part of the drug approval process, often performed by utilizing the Non-Compartmental Analysis (NCA) as the standard methodology in clinical trials. Unlike other analyses usually performed by a team of Biostatisticians and Statistical Programmers in SAS®, NCA requires cross-functional collaboration involving Pharmacokineticist or Pharmacometrician who may perform the NCA in another specialized scientific application like Phoenix WNL®, NONMEM®, or similar. Performing NCA requires an input file of a typical structure and format that is compatible to be used in the NCA application of choice, merging several study variables as well as deriving certain NCA specific variables. The timing and process of generating this input file was highly subjective and varied widely by organizations, until November 2021 when CDISC released and regulated the new ADaM ADNCA domain and required its inclusion in submission package for studies where NCA is performed. This paper will illustrate the end-to-end process of PK/PD profiling using NCA utilizing the newly released ADNCA domain as the input file in compliance with the implementation guidelines. The working relationship of ADNCA with other related domains such as SDTM PC, SDTM PP, ADaM ADPC, and ADaM ADPP and how they are collectively leveraged to complete the prescribed analysis of PK/PD endpoints for the study will also be discussed.

Data Visualization and Reporting

DV-127 : The Missing(ness) Piece: Building Comprehensive, Data Driven Missingness Reports and Codebooks Dynamically
Louise Hadden, Abt Associates Inc.

Reporting on missing and/or non-response data is of paramount importance when working with longitudinal surveillance, laboratory, serological, and medical record data. Reshaping the data over time to produce "missingness" statistics is a tried and true technique, but through using metadata and little known variations of familiar SAS procedures, combined with clever ODS reporting techniques, there's an easier way. This paper and presentation will speed up your data cleaning reconnaissance and reporting, and help you find your missing(ness) piece. Additionally, the same techniques will be used to demonstrate how to create robust and utile data dictionaries.

DV-155 : Combining Functions and the POLYGON Plot to Create Unavailable Graphs Including Sankey and Sunburst Charts
Jeffrey Meyers, Regeneron Pharmaceuticals

SAS currently has an expansive array of plot types available within the SG procedures and Graph Template Language, but there are still charts that are too complex to create using conventional means. Fortunately, SAS has a powerful trick up its sleeve to allow the ambitious programmer to design these charts: the polygon plot. A polygon plot allows the programmer to use geometric, trigonometric, and other equations to manually produce x and y coordinates that form the perimeter around a shape with the option to fill in the shape with color. This paper will walk through utilizing the polygon plot to create circular, twisting, and other complex graphs including as the Sankey diagram and sunburst chart.

DV-170 : Creating Custom Excel Spreadsheets with Built-in Autofilters Using SAS® Output Delivery System (ODS)
Kirk Paul Lafler, sasNerd

Spreadsheets have become the most popular and successful data tool ever conceived. Current estimates show that there are more than 750 million Excel users worldwide. A spreadsheet's simplicity and ease of use are two reasons for the widespread use of Excel. Additional value-added features also help to expand spreadsheet usage including its collaborative capabilities, customizability, ability to manipulate data, application of data visualization techniques, mobile device usage, automation of repetitive tasks, integration with other software, data analysis, and filtering capabilities using autofilters. This last value-added feature, filtering with autofilters, is the theme for this paper. An example application is illustrated to create a custom Excel spreadsheet with built-in autofilters, or filters that provide the ability to make choices from a list of text, numeric, or date values to find data of interest quickly, using the SAS® Output Delivery System (ODS) Excel destination and the REPORT procedure.

DV-186 : Enhanced Spider Plot in Oncology
Ilya Krivelevich, Eisai Inc.
Cixin He, Eisai inc.
Binbin Zhang-Wiener, Eisai Inc.
Wenyin Lin, Eisai Inc

Graphs are an integral part of modern data analysis of clinical trial. Viewing data in a graph together with the tabular results of statistical analysis can greatly improve understanding of the collected data. Graphical data display provides insight into trends and correlations that are simply not possible with tabular data; the visual representation can very often be the most informative way to understand the insight results. The spider plot of change in tumor size from baseline is one of the more common graphs in oncology studies. Unlike the waterfall graph, which displays the maximum change from baseline for each subject, the spider plot allows us to visualize change from baseline over the time. Per our experience, the spider plots could also display other clinical meaningful information, such as time-point responses, study drug dosage, and some subject level information, for example, the value of best overall response. This additional information can be very helpful for reviewers. The demonstrations in this paper are based on RECIST 1.1 evaluation criteria and can be easily adapted to any other tumor evaluation criteria.

DV-196 : Comparing SAS® and R Approaches in Creating Multicell Dot Plots in Statistical Programming
Yi Guo, Pfizer Inc.

In clinical studies, comparisons between groups are very common, and researchers are typically interested in significant results or differences. Instead of laboriously examining statistical tables row by row, multicell dot plots that share the same y-axis can provide a more intuitive and efficient way for researchers to pinpoint the results of interest. Both SAS® and R can create high-quality dot plots. In this paper, we will explain and compare the risk difference (RD) within the system organ class (SOC) for two treatment groups using SAS® GTL versus R ggplot2. Furthermore, we will discuss the challenges when using these different programming languages, compare the syntax and functionality of SAS® and R, and summarize which method is more efficient in different aspects. In addition, the paper will explore the SAS® and R code in an application of dot plots in a real-world evidence (RWE) study.

DV-216 : Utilizing Data Visualization for Continuous Safety and Efficacy Monitoring within Early Development
Margaret Wishart, Bristol Myers Squibb
Tamara Martin, Bristol Myers Squibb

Early Development (ED) clinical research requires substantial ongoing data monitoring due to the complexity of patient safety, risk of drug toxicity, and variability of first-in-human (FIH) trials. Historically, clinical decisions were made throughout a drug development life cycle utilizing static statistical tables, data listings, and graphs. Though these outputs offer valuable statistical findings, the cadence of the data presentation is contingent on milestone deliveries. Within the industry, there is a shift to introduce interactive visualizations used for dynamic data review. Data Visualization (DV) is a solution to bridge the gap for the need of Near Real-Time (NRT) data review between major milestones to proactively empower informed data driven decision-making. Throughout this paper we will highlight the impact of a harmonious partnership cultivated by continued collaboration between DV and ED to allow for active safety and efficacy surveillance.

DV-222 : Kaplan-Meier Graph: a comparative study using SAS vs R
Mrityunjay Kumar, Ephicacy Lifescience Analytics
Shashikant Kumar, Ephicacy Lifescience Analytics

Data Visualization plays a very important role in data analysis and reporting. Due to increase in high volume of data it has become essential to display data in form of different types of graphs. Kaplan- Meier graph using PROC LIFETEST is a well-known name to display time to event data in the field of clinical SAS programming due to its widespread usage in reporting survival analysis or any time-to-event based data. The purpose of this paper is to explain how Kaplan-Meier curves are developed and analyzed in both SAS and R in a stepwise manner. Often, customizing the graph to include additional parameters appears challenging in SAS however its comparatively easier in R. Understanding with the help of an example based on two small treatment groups of hypothetical data, new users can understand how the process works and can lead to a pioneer discussion on leveraging open-source software like R. Shown examples also illustrate the crucially important point during the comparative analysis and may provide an alternative way of creating graphs to statistical programmers and help explore various functionalities in R.

DV-246 : AutoVis Oncology Presenter: Automated Python-Driven Statistical Analysis and Visualizations for Powerful Presentations
Indraneel Narisetty, Jazz Pharmaceuticals

In late-phase or first-in-human clinical studies, understanding clinical data is vital for informed decision-making like selecting the appropriate drug dose and evaluating its efficacy and safety. The traditional process of converting ADaM data sets into TLFs (tables, listings, and figures) and integrating them into clinical PowerPoint presentations has historically been a time-consuming task. Medical monitors and clinical teams create these presentations to conclude on dose selection, escalation, and drug effectiveness. Addressing this need, we've introduced "AutoVis Oncology Presenter," an innovative Python-based tool designed to streamline the transformation of Oncology clinical trials data into clear, impactful PowerPoint presentations. It's particularly adept at handling key ADaM datasets like ADTTE and ADRS, which are crucial for assessing treatment effectiveness. This paper will demonstrate how to build this tool, complete with Python code and practical examples. The goal of AutoVis is to make important safety and efficacy data both comprehensible and visually appealing using Python packages. It automates the generation of detailed tables and striking graphs, such as spider plots showing patient responses, waterfall plots, and swimmer plots, all neatly incorporated into PowerPoint presentations. Moreover, it helps in comparing CSR (Clinical Study Report) tables when they are generated, thereby enhancing the efficiency and clarity of presentations. This feature is particularly beneficial for clinical teams who need to regularly share their findings, be it in meetings, conferences, or reports. AutoVis accelerates the sharing of vital information, thereby advancing our understanding and treatment of cancer.

DV-261 : RWE, Big Data and ML for product innovation in medical devices
Karthik Venkataraman, Algorics
Gayathri Ravikumar, Algorics

As more medical devices need to undergo a clinical evaluation, real-world data plays an important role in product innovation, patient-centricity, and regulatory submissions. In this session on how AI/ML contributes to the continuous improvement of medical devices through the analysis of real-world data, we will discuss a NASSCOM award-winning real-life case, on how AI/ML was utilized to ingest, harmonize, analyze and visualize data from medical devices, resulting in meaningful insights for product innovation and better patient experience. This analysis revealed trends, correlations, and unexpected associations between device usage and patient outcomes. The insights gained provided valuable information for product innovation, helping manufacturers identify areas of improvement in their devices, addressing pain points for patients, and enhancing overall usability. This session will provide you with a comprehensive understanding of how the use of AI/ML to analyse, visualize and interpret real-world data contributes to the continuous improvement of medical devices, ultimately benefiting both manufacturers and patients alike.

DV-278 : Standardization of the Patient Narrative Using a Metadata-driven Approach
Kuldeep Sen, Pfizer

Patient narratives are important components of any Clinical Study Report (CSR) and are attached as "Attachment I" in the CSR. They are part of the FDA's and the ICH's requirements to provide information about safety of a study - specifically the events of death, serious adverse events, and other adverse events of clinical importance at patient level. The patient narratives have become one of the statistical programming deliverables, and creating high-quality narratives can be a challenging task due to several reasons such as unconventional nature of the narrative template, dynamic nature of data at subject level, and limited knowledge of its scope within a statistical programming team. This paper presents a metadata-driven end-to-end approach, which begins with standardization of the narrative template and employs metadata-driven programming automation. This approach has been shown to greatly enhance the consistency and efficiency in creating the patient narratives.

DV-283 : Exploring the Application of FDA Medical Query (FMQ) in Visualizing Adverse Event Data
Tongda Che, Merck
Danfeng Fu, MSD

The analysis and visualization of adverse event (AE) data is critical for evaluating drug safety in clinical trials. This paper explores the application of the FDA Medical Query (FMQ) using R to conduct safety analyses and create meaningful visualizations of AE data. FMQ provides a standardized approach to identify adverse events of interest. By leveraging FMQ with R, automated safety data workflows can be created to accelerate drug safety reviews. In this paper, adverse event data coded with the MedDRA dictionary was mapped to FMQ list. The resulting analysis datasets are visually represented interactively using R Shiny to display various R graphs, such as Lollipop plots, pie charts, and circular bar charts. These visualizations offer insights into AE incidence, severity, timing, durations, and relationships.

DV-293 : Splashy Graphics Suitable for Publication? ODS LAYOUT Can Do It!
Dave Hall, BridgeBio Pharma

The TLFs are done. Pinnacle 21 and the CDISC package is done. Questions have been answered. It's all been delivered. You think it's all over when the Marketing people come to you. "We need some fancy graphs and plots for publication in a magazine! A page full of the ones you did for the submission, but shiny and flashy!" You start to say "SAS can't do that. We need some high-edge software for what you're talking about," but that's not you. What are you going to do? So you think SAS can't create the kind of copy that Marketing is talking about? Think again. By adding just a few simple ODS LAYOUT statements, the same chunks of code you used for the clinical study report figures can be combined in a program to produce a stunning page of spectacular graphics. If you've already created the figures, you've done the hard part. All that's left is sizing and arranging them, and maybe plugging in some color for pizazz, and Presto! You've got content that's good enough for any magazine or journal, and you didn't even break a sweat.

DV-313 : Visual discovery in Risk-Based Monitoring using topological models
Kostiantyn Drach, Intego Group
Iryna Kotenko, Intego Group

A strong endorsement from the FDA and EMA for efficient oversight of clinical investigations leads to a crucial role of risk-based monitoring (RBM) in new drug development. The main goal of the RBM approach is to offer strategic and effective ways to allocate resources across a study based on several key indicators, such as data criticality, patient safety, protocol compliance, and others. One of the components in RBM is the Clinical Trial Site Monitoring which presents a significant challenge as it requires timely insights generation from the data coming from sites in almost real-time. As monitoring sites' activities is an important task to ensure protocol compliance and safety of patients and the resulting drugs, in this paper, we introduce how to aim it with a visualization approach using topological models of the data. We represent the clinical data using a graph (a topological model) that captures the geometric properties of complex data and where each node corresponds to a clinical trial subject, while two nodes are connected with an edge if these two subjects have similar outcomes/indicators of interest. A variety of graphs can be generated depending on indicators of interest. Those with robust geometric structures may further be analyzed using interactive operations and various ML algorithms. Using the topological models, the researcher can easily highlight data coming from specific sites and further analyze problematic pieces of data. In an experiment, we will demonstrate this visual discovery approach compared to standard statistical methods.

DV-323 : Tales From A Tech Support Guy: The Top Ten Most Impactful Reporting and Data Analytic Features for the SAS Programmer
Chevell Parker, SAS Institute

ABSTRACT This document describes techniques used to get the most out of SAS by getting to know some of its most impactful features. Covered are features that can make a big impact on your job as a SAS programmer and Data Analyst. Some of the areas covered include features that highlight the power of SAS reporting which allows the SAS programmer to generate customized reports. Additional areas covered how SAS is integrated with open-source tools, as well as automation. Bonus topics include some of the most impactful features in Viya.

DV-327 : A R Markdown Structure for Automatically Generating Presentation Slides
Junze Zhang, Merck Co., Inc
Chuanhai Tang, Merck & Co., Inc.
Xiaohui Wang, Merck & Co., Inc.

The paper introduces a method to structure R Markdown, separating the code to define the output format and the code to conduct the analysis and compile the output using the predefined format. This paper will give an example of generating PowerPoint slides, though this method can be applied to various report formats. Traditionally, R Markdown is an authoring framework executing code and generating reports simultaneously. This method provides additional structure to the traditional method, which helps eradicate the work of manual data transfer to slides. For example, instead of regularly compiling slides for upper management to review the study result, statistical programmers can help stats utilize this method to compile the R markdown for stats to simply insert the code and output the result in the desired PowerPoint slides format. Consequently, this generates updated slides with the most recent data, requiring little to no modifications. The primary objective of this paper is not to provide more detailed R Markdown techniques for creating complex PowerPoint slides, but to propose a practice that enhances the structure of your R Markdown program for improved maintainability, user-friendliness, and repeatability, thereby easing your tasks.

DV-328 : Next level Reporting: ODS and Open Source
Chevell Parker, SAS Institute

Abstract Both data analysts and SAS programmers are commonly tasked with analyzing and reporting on data from various sources. The SAS System is equipped with tools that facilitate reading data from most data sources and telling the story of your data. One of the powerful reporting features in SAS is the Output Delivery System (ODS). The Output Delivery System delivers an efficient process that enables you to create presentation-ready reports in some of the industry's most popular formats. This paper starts by discussing the basics of the Output Delivery System and its statements and destinations. Also discussed are numerous other reporting and styling components of ODS that are required for that polished report. Further discussions expand on how to fully customize Excel worksheets using the ODS Excel destination. Finally discussed is the integration of the open-source Python language into SAS and how this facilitates reporting by allowing data to be exported in additional formats than what can be done currently. This functionality is extended to file formats such as Excel, allowing customizations beyond those that can be done using current methods of export.

DV-331 : Ten Rules for Better Charts, Figures and Visuals
Kirk Paul Lafler, sasNerd

The production of charts, figures, and visuals should follow a process of displaying data in the best way possible. However, this process is far from direct or automatic. There are so many ways to represent the same data: histograms, scatter plots, bar charts, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more inclusive definition to produce charts, figures, and visuals would be a graphical interface between people and data. This paper highlights and expands upon the work of Nicolas P. Rougier, Michael Droettboom, and Philip E. Bourne by sharing ten rules to improve the production of charts, figures, and visuals.

DV-348 : Periodic safety reports of clinical trials
Murali Kanakenahalli, Kite Pharma
Annette Bove, ClinChoice Inc
Smita Sehgal, Orbis Clinical

Safety data is an important part of clinical trials. Periodic safety reporting helps in understanding the safety risk of all investigational products. These reports inform regulators on evolving safety profile of an investigational drug and apprise them of actions to address any safety concerns. Some of the key safety reports are Investigator's Brochure (IB), Development Safety Update Report (DSUR), Periodic Safety Update Report (PSUR). The DSUR is intended as a common standard for periodic reporting on products under development among the ICH regions. DSUR is submitted annually. The main objective of a DSUR is to present a comprehensive, thoughtful annual review and evaluation of pertinent safety information collected. An IB is a collection of clinical data about the investigational product that is the focus of the study. A new IB is usually initiated for new products in preparation for Investigational New Drug submission. IB is also updated annually to assess the safety risk to trial subjects. The PSUR is a comprehensive safety report that provides a periodic assessment of the safety profile of a marketed medicinal product. It covers the post-authorization phase and is submitted to regulatory authorities at semiannual or annual intervals according to regulatory requirements. This paper focusses on workflow in producing reports for DSUR, IB and PSUR from a statistical programming perspective. Insights on different types of output used in producing DSUR, IB and PSUR. Special case situations on how to handle newer studies or closed studies, MedDRA up-version effects and Adverse Events of Interests are discussed.

DV-380 : Amazing Graph Series: Swimmer Plot - Visualizing the Patient Journey: Adverse Event Severity, Medications, and Primary Endpoint
Tracy Sherman, Ephicacy
Aakar Shah, Acadia Pharmaceuticals Inc.

Have you been tasked to map out the patient experience comprising of adverse events (AEs), related concomitant medication and efficacy endpoint data along the study duration? Have you ever used interactive feature of the graphs which allows you to communicate significantly more information with the stakeholders? This paper provides a vital analysis of clinical trial data, unveiling the nuanced interplay among AEs, concomitant medication, and primary endpoint. Introducing an advanced SAS® method, the paper employs PROC SGPLOT with HIGHLOW and SCATTER statements to craft an interactive HTML swimmer plot, featuring data tip functionality. Additionally, the integration of information from three studies enhances the visualization of individual patient journeys.

DV-382 : A 'Shiny' New Perspective: Unveiling Next-Generation Patient Profiles for Medical and Safety Monitoring
Helena Belloff, Formation Bio
William Lee, Formation Bio
Melanie Hullings, Formation Bio

In clinical trials, patient profile listings are vital for medical and safety monitoring, offering a dynamic visual representation of individual patient information. Reliance on manual methods to coalesce and analyze patient data for these listings has posed challenges in data integration, speed, interactivity, and flexibility. Addressing these issues, our Clinical Data Science team has developed an innovative Patient Profiles R Shiny app, revolutionizing patient data visualization and reporting in clinical trials. The Patient Profiles R Shiny app outperforms the manual workflow through seamless integration of data from various clinical trial sources, enhancing coherence and utility for medical and safety monitoring. It effectively displays longitudinal data, enabling the tracking of temporal changes and the identification of correlations between medical history, demographics, adverse events, lab results, and other patient information. Highly customizable, the app's interface can be tailored to the Medical and Safety Monitoring team's needs, providing unmatched flexibility in data visualization. Additionally, its interactive graphs and tables allow for a thorough exploration of complex data relationships, aiding monitors in understanding information more holistically, quickly, and accurately. This advancement represents a significant shift in how medical professionals and stakeholders interact with patient-level data. The app's ability to integrate and display data from various sources facilitates a seamless review and interaction, enabling sound, evidence-based decision making. In essence, the Patient Profile R Shiny app leverages advanced technology to make large, dynamic data sets comprehensible and actionable, setting a new standard of data visualization in the pharmaceutical and biotechnology industry.

DV-389 : Automation and integration of data visualization using R ESQUISSE & R SHINY
Vijayakumar Radhakrishnan, ALGORICS
Nithiya Ananthakrishnan, Algorics

In this paper, we will be discussing how R ESQUISEE can be instrumental in implementing automation and MLP for data visualization. R Esquisse is an R programme that provides a user-friendly interface for creating data visualisations with ggplot2. This package is especially useful for stakeholders that require quick interactive analysis and insights from varied data sources. With R Esquisse, you can quickly drag and drop variables from your data onto a plot canvas, customise plot features like titles, axis labels, and colours, and generate ggplot2 charts with a few clicks. R Esquisse's UI is comparable to Tableau, a prominent data visualisation software, making it simple for Tableau users to adapt to R Esquisse. It also has advanced features like multi-language compatibility and an in-browser application, which further enhances the user experience. In this paper, we will do an in-depth assessment of installation and data import into R Esquisse, as well as its capabilities to demonstrate how it can be used to produce spectacular data visualisations in R. Furthermore, the package can be integrated with R Shiny to create a data visualisation web browser to automate the channels supporting the evaluation of big data with drag and drop functionality and additional extended features to customize as per the needs of the study teams

DV-395 : Pictorial Representation of Adverse Events (AE) Summary- " A new perspective to look at the AE data in Clinical Trials
Pradeep Acharya, Ephicacy Lifescience Analytics
Anurag Srivastav, Ephicacy

In clinical trials, ensuring the Safety of the trial subjects is very critical along with the efficacy of the drug. The sponsors are directly accountable for monitoring the safety profiles of the subjects and it is identified by analyzing the Adverse Events (AE). An AE is defined as any unintended change in anatomical, physiological or metabolic functions as indicated by physical signs, symptoms and/or laboratory changes occurring in any phase of the clinical study whether or not such change is considered drug related. As a practice in clinical trials, the Adverse Events data has been looked through the Summary tables and the Listings. There is a need for greater vigor in the analysis and presentation of the AE data along with the tables/listings to help clinicians to visualize and easily understand the pattern of adverse events so that they can write Clinical Study Reports (CSR) more efficiently. In this paper we introduce a new perspective to look at the AEs in the form of Graphs. We consider AE data from one of the Therapeutic Areas and try to show the individual AEs and the AE summary in the form of graphs such as line chart, bar chart, cluster chart, stack chart etc. Other relevant charts also will be displayed. The graphs will be shown with AEs by System Organ Class (SOC) and/or Preferred term (PT) for reduced tables. Individual AEs will be displayed in figures wherever possible. The benefits/drawbacks of this visual representation shall be discussed in this paper.

DV-396 : Piloting data visualization and reporting with Rshiny apps
Yun Ma, boehringer-ingelheim
Yifan Han, boehringer-ingelheim

The pharmaceutical industry is increasingly adopting open-source R and shiny for web-based data visualization and reporting, given its numerous advantages such as easy comprehension, interactive analysis, real-time updates, etc. This presentation will provide some real practical examples to illustrate these benefits, such as Exploratory Biomarker Analysis in Statistics, efficacy analysis plots in oncology, real-time safety data monitoring. In addition, we will share the learnings that our team went through in the last two years since we established a team of R programmers within our biometrics function. Our goal was to leverage the R open-source software language for data visualization and reporting across the company. Over this period, we've worked closely with various stakeholders within the medical functions to align our goals and gather user requirements. Starting from single app to an application warehouse, known within the company as Data Access and Dynamic Visualization for Clinical Insights (DaVinci). This warehouse hosts over 15 different applications, ranging from heavily statistical apps for early clinical development trial planning to apps with intuitive features for safety monitoring and oncology efficacy analysis. Our team learned from many challenges when started two years ago, such as IT Infrastructure, Online Systems, user experience, etc. We will share the above points with more details in the presentation. These challenges highlighted the need for careful planning, robust design, and thorough testing when developing data visualization modules.

DV-433 : Interactive Data Analysis and Exploration with composR: See the Forest AND the Trees
Steve Wade, Jazz Pharmaceuticals
Sudhir Kedare, Jazz Pharmaceuticals
Matt Travell, Jazz Pharmaceuticals
Chen Yang, Chen Yang
Jagan Mohan Achi, Jazz Pharmaceuticals

Deciding on Post-hoc analyses can be a time-consuming process and it's critical an analysis has been vetted prior to production and release. Jazz is pioneering creative techniques in data discovery using open-source technologies. Accordingly, composR is an interactive R-Shiny application that provides a user-friendly interface for rapid data analysis and exploration. This paper demonstrates how composR allows users to filter data, create new variables, and perform analyses, summaries, and visualizations. It also enables further drill-down into summaries and charts for more in-depth analysis. Although not required, composR supports CDISC standard data from a variety of file types including SAS®, XPT, CSV, Excel, and Rda. ComposR reduces time, effort and cost over traditional TLF processes by providing insights into analyses prior to formal TLF production. This ensures only necessary analyses are produced, eliminating much of the back-and-forth between departments.

DV-438 : Exploring DATALINEPATTERNS, DATACONTRASTCOLORS, DATASYMBOLS, the SAS System® REGISTRY procedure, and Data Attribute Maps (ATTRMAP) to assign invariant attributes to subjects and arms throughout a proj
Kevin Viel, Navitas Data Sciences

A subject, arm, or other grouping characteristic can appear in numerous figures, including in various deliveries with updating data, including new subjects or arms. Being able to assign a given subject invariant attributes such as a line pattern, marker, line color, and marker color may assist in the interpretation or review of the figures containing them without burdensome references to a legend. For instance, subjects in the first arm might be assigned blue lines in every plot and USUBJID 1 might be assigned a dashed line with an open diamond symbol. With care, if that subject is enrolled in a new trial, these attributes can remain with her or him, that is, be invariant across trails, or updates (interim and final deliveries). Further, using the SAS System® REGISTRY procedure and testing, one can determine which combinations of attributes might be standards for a company. Creating a data set for the compound or trial, one can assign a unique combination of attributes to a subject and use that data set to create an attribute map data set for the ATTRMAP= option in the Graph Template Language Template. The goals of this paper are to demonstrate 1) how to explore the attributes by graphing them, 2) how to explore the color values displayed by the REGISTRY procedure, and 3) how to create such as data set and use in the ATTRMAP= option.

DV-455 : Reimagining reporting and Visualization during clinical data management
Vandita Tripathi, Ms
Manas Saha, TCS

Clinical Data Management is a pivotal process in clinical trial, and during clinical data management, it's crucial to monitor the analytics of the study parameters, which may include statistical analysis of the captured patient data as well as management data like audit trails and queries. Reports and dashboards are the primary means by which a reviewer / monitor can get access to all these required analytics from a clinical trial management system. Access to quality reporting data has a critical influence on data management and study success. Clinical trial management systems come with a wide variety of reports that enables the users to get insights on various parameters of the study. Most of the reports are pre-defined and offers little or no configuration options to fine tune the report fields and filters. In real-world situations, often the users run into scenarios where they need a specific type of report containing specific fields or filter conditions that may not be available out-of-the-box in the clinical trial management system. An AI-based solution that can understand Natural language input and translate the same into database queries can solve this problem. The solution will accept a natural language plain English prompt from the end user specifying the report fields and filter requirements, then the AI will translate the requirements into database queries and execute the query to generate and download the required report. This solution reimagines the reporting solution from traditional system driven reports to user driven custom outputs

DV-456 : An introduction to Quarto: A Versatile Open-source Tool for Data Reporting and Visualization
Joshua Cook, University of West Florida (UWF)

In the collaborative landscape of data analysis, a common frustration among analysts stems from the need to integrate and harmonize different programming languages within a team. Teams often comprise interdisciplinary researchers, each with their unique programming preferences and expertise, leading to complexities in project integration and continuity. The difficulty in compiling and executing data projects cohesively can hinder efficiency and impede the delivery of coherent, multi-faceted data insights. This challenge necessitates a solution that can bridge the gaps between varying coding languages and methodologies to streamline team collaboration and project completion. Addressing this issue, the point of this paper is to present Quarto as an innovative solution that can unify the diverse programming approaches within a team. Quarto stands out by offering extensive cross-language support, enabling the integration of code from multiple languages into a singular, dynamic report. This versatile reporting system is tailored for the pharmaceutical and biotech industries, facilitating the creation of comprehensive reports and visualizations that cater to stakeholders at all technical levels. With Quarto, consolidating code, narrative text, and outputs into one document is seamless, accommodating outputs in various formats such as HTML, PDF, Word, Typeset, Markdown, PowerPoint, dashboards, websites, manuscripts, and even entire books. This paper serves as an introduction to Quarto's capabilities, highlighting its role in enhancing collaboration and efficiency in data science projects across the spectrum of technical expertise.

DV-458 : Quarto 1.4: Revolutionizing Open-source Dashboarding Capabilities
Joshua Cook, University of West Florida (UWF)
Kirk Paul Lafler, sasNerd

The release of Quarto 1.4 marks a significant advancement in the realm of data visualization and reporting, introducing powerful dashboarding capabilities that cater to the evolving needs of data analysts. This abstract presents an introduction to utilizing Quarto for dashboard creation, emphasizing the importance of this development for the pharmaceutical and biotech industries. Dashboarding in Quarto 1.4 is pivotal, as it allows for the integration of complex data into cohesive and interactive visual summaries. This paper will explore the innovative features of Quarto's dashboarding tool, which simplifies the process of synthesizing large datasets into accessible and actionable insights. Quarto dashboards support a variety of output formats, which enable analysts to communicate their findings to a broad audience, ranging from high-level stakeholders to the general public. The versatility of Quarto ensures that users can create custom dashboards that are both informative and engaging, regardless of the viewer's technical expertise. The paper will demonstrate how Quarto's cross-language support, combined with its dashboarding functionality, provides a comprehensive environment for data reporting. This integration is crucial for the biotech sector, where making informed decisions often depends on understanding complex datasets. By leveraging Quarto's dashboarding tools, analysts from a variety of programming backgrounds can more effectively present their data, leading to improved decision-making processes and clearer communication of findings. The potential of Quarto to enhance open-source data visualization and reporting practices within the industry will be a focal point of this paper.

Hands-on Training

HT-101 : Deep Dive into the BIMO (Bioresearch Monitoring) Package Submission
Mathura Ramanathan, IQVIA
Nancy Brucken, IQVIA

As a part of the review process for regulatory submissions to the Food and Drug Administration (FDA) by both pharmaceutical companies and CROs, the FDA carries out site-level inspections to ensure the integrity of the data submitted, and to verify that the rights, health, and welfare of those who participated in the studies were protected. To efficiently audit the sites, the FDA has established the Bioresearch Monitoring (BIMO) Program for the studies being submitted for their review. Currently, the BIMO package includes three required components: 1. Clinical Study-Level Information, 2. Subject-Level Data Line Listings by Clinical site, and 3. Summary-Level Clinical Site (CLINSITE) Dataset, and an optional but important document, called the BIMO Data Reviewer's Guide (BDRG). The scope and format of these components are governed by the BIMO Technical Conformance Guide (TCG) and BDRG template. In this training, the authors clarify the generation of these BIMO components based on the latest TCG and Guidance documents, discuss the potential issues that can surface during the generation and address any questions that the audience brings up during the seminar.

HT-111 : A Gentle Introduction to SAS Packages
Bart Jablonski, yabwon

When working with SAS code, especially when it becomes more and more complex, there is a point in time when a developer decides to break it into small pieces. The developer creates separate files for macros, formats/informats, and for functions or data too. Eventually the code is ready and tested and sooner or later you will want to share code with another SAS programmer. Maybe a friend has written a bunch of cool macros that will help you get your work done faster. Or maybe you have written a pack of functions that would be useful to your friend. There is a chance you have developed a program using local PC SAS, and you want to deploy it to a server, perhaps with a different OS. If your code is complex (with dependencies such as multiple macros, formats, datasets, etc.), it can be difficult to share. Often when you try to share code, the receiver will quickly encounter an error because of a missing helper macro, missing format, or whatever- Small challenge, isn't it? How nice it would be to have it all (i.e. the code and its structure) wrapped up in a single file - " a SAS package - " which could be copied and deployed, independent from OS, with a one-liner like: %loadPackage(MyPackage). In the presentation an idea of how to create such a SAS package in a fast and convenient way, using the SAS Packages Framework, will be shared.

HT-118 : The Art of Defensive SAS Programming
Philip Holland, Holland Numerics

This paper discusses how you cope with the following data scenario: the input data set is defined in so far as the variable names and lengths are fixed, but the content of each variable is in some way uncertain. How do you write a SAS program that can cope appropriately with data uncertainty?

HT-143 : The New Shape Of SAS Code
Charu Shankar, SAS Institute

What is SAS Code? How many SAS languages exist to manipulate data? If these questions got you thoughtful, then this presentation is just for you. Come explore the many dimensions of SAS code, the commonality and differences between the various languages in SAS. this presentation you will learn the basics of 4 familiar SAS languages, like the Base SAS data step, PROC SQL, PERL language elements, the SAS Macro language plus a 5th, an introduction to the language of CAS(cloud analytic services). Learn to check your data with elegant techniques like Boolean logic in PROC SQL, operators in the DATA step/PROC step, functions like the SCAN function within the DATA step, efficient checking of your data with PERL regular expression, and last but not least the amazing marriage between PROC SQL & the SAS macro language to reuse data. You will also learn where manipulating data in the cloud. CAS can be beneficial. This presentation will focus on coding techniques for data investigation and manipulation using Base SAS & SAS Viya.

HT-152 : GenAI to Enhance Your Statistical Programming
Phil Bowsher, RStudio Inc.

Posit/RStudio will be providing a hands-on session on using AI to enhance open-source statistical programming. This session will discuss opportunities and applications for AI to empower statistical programmers. This talk will explore and discuss GenAI to support programmers in the process of writing code and provide a hands-on opportunity to test it out.

HT-157 : Understanding Administrative Healthcare Datasets using SAS ' programming tools.
Jayanth Iyengar, Data Systems Consultants LLC

Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers and insurance companies is massive and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of administrative healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This training seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises involving healthcare datasets.

HT-197 : Building Complex Graphics from Simple Plot Types
Dan Heath, SAS

There are many type of graphics, such as survival plots, adverse event plots, and forest plots, used to report clinical data. However, many of these graphics might not be available as a simple request. Even if they are available, there can be many variations on these graphics, and the reporting requirements can be different from one company to another. In this hands-on workshop, I want to show you how to construct these types of graphics by combining simple plot types, sometimes in very creative ways. We will discuss how to think about a graph as the sum of its parts. This will empower you to create these types of graphics in whatever form you need.

HT-201 : Transitioning from SAS to R
Ashley Tarasiewicz, Atorus Research
Chelsea Dickens, Atorus Research

As companies continue to realize the benefits of having a multilingual programming team, many are encountering the same problem - " there is very little training specific to the traditional clinical trial workflows we use on a daily basis. This hands-on training will leverage a clinical programmer's existing SAS® and clinical programming knowledge to take experts in SAS® and train them to use R for clinical trials. In the training, trainees will be introduced to key functions in base R, the tidyverse, and the pharmaverse that can be used to replicate common tasks they're currently using SAS® for. We'll use publicly available CDISC data and side by side comparisons of SAS® and R code to ensure a translation of existing skills, not an introduction to something completely different. Through a mixture of lecture to introduce concepts and guided practice to allow concepts to be practiced directly after learning them, trainees will walk away with an introduction to R, the knowledge of how to use R to do some of what they're already doing in SAS®, and future steps for where they can go next in their R programming journey.

HT-413 : Complex Custom Clinical Graphs Step by Step with SAS® ODS Statistical Graphics
Richann Watson, DataRich Consulting
Josh Horstman, Nested Loop Consulting

The ODS Statistical Graphics package is a powerful tool for creating the complex, highly customized graphs often produced when reporting clinical trial results. These tools include the ODS Statistical Graphics procedures, such as the SGPLOT procedure, as well as the Graph Template Language (GTL). The SG procedures give the programmer a convenient procedural interface to much of the functionality of ODS Statistical Graphics, while GTL provides unparalleled flexibility to customize nearly any graph that one can imagine. In this hands-on workshop, we step through a series of increasingly sophisticated examples demonstrating how to use these tools to build clinical graphics by starting with a basic plot and adding additional elements until the desired result is achieved.

HT-459 : Hands-on Python PDFs: Using the pypdf Library To Programmatically Design, Complete, Read, and Extract Data from PDF Forms Having Digital Signatures
Troy Martin Hughes, Datmesis Analytics

The Python pypdf library ( facilitates the programmatic creation, editing, completion, and extraction of PDF forms. Using only pypdf, developers can design a PDF form, make the form available to end users, prepopulate the form through data-driven programming, and programmatically collect additional user data. This hands-on workshop introduces users to the powerful pypdf library, and demonstrates a solution that does enable PDF forms to be generated (and populated) dynamically- "including forms that contain digital signature blocks. The real-life business case examines the permission-to-publish (aka copyright-release) forms that are regularly required to attend conferences like WUSS and PharmaSUG. Attendees will learn how to build these forms dynamically using a data-driven software design approach, and to extract data to populate external data products such as HTML webpages and Excel workbooks. All Python code will be shared so that attendees can implement similar solutions within their respective organizations.

Leadership Skills

LS-134 : Recruiting Neurodivergent Candidates using the Specialisterne Approach
Patrick Grimes, Parexel

This paper explores the utilization of Specialisterne, a neurodiversity-focused recruitment and consultancy organization, in the process of hiring neurodivergent candidates within a Statistical programming group. Neurodivergent individuals possess unique cognitive strengths, including exceptional attention to detail, pattern recognition, and logical reasoning, which are highly applicable to roles requiring complex data analysis and programming skills. The aim of this study is to document the experiences and outcomes of incorporating neurodivergent candidates into the Statistical programming group and to evaluate the effectiveness of the Specialisterne approach. The paper discusses the identification of suitable candidates through specific assessment methodologies tailored to the needs and abilities of neurodivergent individuals. Furthermore, it examines the adaptation of the recruitment process to support successful integration and ongoing development of neurodivergent employees within the team. The paper also considers the benefits and challenges observed throughout the implementation of this hiring approach, from both the perspective of the organization and the neurodivergent employees themselves. By sharing the experiences gained and lessons learned from utilizing Specialisterne in the recruitment and integration of neurodivergent individuals, this paper aims to provide insights and practical guidance for other organizations seeking to enhance diversity and inclusivity within their statistical programming groups.

LS-167 : Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
Kirk Paul Lafler, sasNerd

Today's economy requires its workforce to acquire two types of skills: hard skills or job-related knowledge and abilities to help us perform specific job responsibilities effectively, and soft skills or personal qualities that help us thrive in the workplace. So, what are examples of hard skills? Examples of hard skills include SAS and Python programming, data analysis, project management, and market research. Soft skills on the other hand are not always measurable and consist of non-technical skills that describe the characteristics, attributes, and traits that are associated with one's personality. Soft skills enable effective and harmonious interaction with others in the workplace and are acquired from the roles and/or experiences one has had. The good news is that soft skills can be learned and, more importantly, provide one with a competitive edge in today's demanding and evolving workplace.

LS-176 : Effectively Manage the Programming Team Using MS Team
Jeff Xia, Merck
Simiao Ye, Merck & Co., Inc.

Microsoft Teams is a powerful platform that not only serves as a messaging application but also facilitates real-time collaboration, meetings, and easy file sharing. In this paper, we demonstrate how MS Teams can be utilized to efficiently manage a programming team, promote knowledge sharing, and streamline the onboarding process for new team members. Topics covered include: - Seamless file sharing - Knowledgebase sharing for tips, best practices, and know-how - Effective communication via announcements for important messages By leveraging MS Teams, programming teams can optimize productivity, enhance communication, and foster a culture of collaboration.

LS-286 : Unlock Your Greatness: Embrace the Power of Coaching
Priscilla Gathoni, Wakanyi Enterprises Inc.

As an individual, do you know what you want to be and to do? What is your purpose in life? Do you understand your strengths, weaknesses, opportunities, and threats (SWOT) analysis? Do you know what you need to get to the next level in your career and life? Do you find it challenging to solve problems? Do you know how to be truly present in your conversations? Do you struggle to handle different personality types? What is your biggest frustration at work right now? Coaching is a vital tool for every individual to have at their disposal if they are committed to fulfilling and advancing their potential, as well as balancing life and work. Coaching relationships are built upon truth, openness, and trust and allow the person being coached to be responsible for their own results and think creatively. Well-executed coaching empowers individuals to take action, increase their personal performance and professional effectiveness in problem-solving and decision-making skills, and influence others. We will explore the value coaching brings to your life and work and why coaching should be part of your professional development plan.

LS-304 : Translation from statistical to programming: effective communication between programmers and statisticians
Diana Avetisian, IQVIA

Statisticians and programmers who are working in clinical trials know that effective communication between the groups is a key factor for a study to succeed. But do we really know what statistician is expecting from programmers when they are working on complex statistical models and what programmers need to be efficient in the process of efficacy outputs creation? Programmers have a complex task when it is come to statistical analysis implementation, they are creating ADaMs based on CDISC standards which is allowing them to create outputs and on the next step producing TLFs to display results of statistical analysis. The truth is that it is not always obvious how data should look like to be suitable for a particular statistical analysis that's why programmers and statisticians are working together to help each other to achieve the best results. At the same time, statisticians have their own challenges. What kind of information is needed for programmers to create TLFs? Is it enough to have a programming note in the shells and some code examples or do programmers need more? To answer all these questions, it will be useful for each group to have some guidance to understand each other needs that's why in this paper the author discusses: -the process of efficacy output creation from the raw data to final table; -some common questions from statistician and programmers; -some tips and tricks for programmers how to understand the requirements of statistical analysis even without statistical background; -programmer's expectations from statisticians.

LS-317 : What Being a Peer-to-Peer Mentor Offers - Perspective from an Individual Project Level Contributor
LaNae Schaal, AstraZeneca

Typically, mentoring is done to help a person obtain skills to climb the career ladder. The traditional concept of mentoring is a structured relationship in which a senior individual imparts wisdom to a less experienced individual. For statistical programming, this could involve someone from a management level or a subject matter expert meeting with a more junior programmer. That hierarchical understanding overlooks a second possible structure for mentoring: peer-to-peer. Two individuals similarly situated in an organization committing to regular mentoring meetings. For example, a study level Senior Statistical Analyst serving as mentor to a Senior Statistical Programmer on a different study. During the last 7 years, I have participated in both types of mentoring relationships. This paper will briefly address components of effective peer-to-peer meetings. Then I will outline how being the mentor for peer-to-peer mentees has helped me develop professionally. For example, I have a) developed leadership skills such as communication and listening, b) broadened my understanding of the career opportunities in Biometrics, c) improved self-awareness, and d) increased job satisfaction. As you explore professional development for yourself or your organization, please consider how peer-to-peer mentoring can be beneficial where you currently are.

LS-335 : Creating a Culture of Engagement - Role of a Manager
Monali Khanna, YPrime

Work could be just work, or it could additionally bring a sense of purpose and satisfaction. What could contribute most to your team being high-performing and happier at work? Could you do something to make your team feel genuinely cared for, appreciated, and more productive at work? A high-performing workforce is essential for survival and growth, and a highly engaged workforce is more likely to go the extra mile to contribute to the organization's mission and goals. We all who work in the life sciences and healthcare industry have a unique opportunity where our work has a strong effect on many lives and that, when recognized, may result in an engaging and fulfilling career. As leaders, we can engage our teams more, so they feel empowered, contribute to creating essential goals and objectives, and feel good about it. This presentation will focus on what employee engagement is, why employee engagement is critical, what employees need to feel engaged, and how to engage employees. It will describe the employee engagement strategies and how to implement them. It will suggest to those in managerial positions how to make team members feel heard, valued, and trusted, unlock their full potential, and achieve better performance outcomes.

LS-345 : Leadership Lessons from Another Life: How my Previous Career Helped Me as a Statistician
Christiana Hawn, Catalyst Flex
Lily Ray, Catalyst Clinical Research

Experienced statisticians and programmers recognize that successful clinical research projects depend on more than just technical skills. Statisticians and programmers must collaborate with cross-functional teams, negotiate for defensible methods and reasonable timelines, facilitate productive meetings, and communicate methods and results in a manner accessible to colleagues of different backgrounds. Statisticians and programmers who change careers may appear to be "starting over", and while this is true from a technical standpoint, the value of the leadership and collaboration skills they bring from their former careers should not be overlooked. Prior to becoming a biostatistician, Christiana was a Coast Guard officer, driving ships across the Pacific Ocean. Lily was a social scientist in Alaska, conducting participatory environmental research and leading healthcare program evaluations. This paper will include reflections on our career changes and how the skills developed in our previous career fields, though not considered experience from a technical standpoint, have been invaluable for our transition into the world of biostatistics in clinical research. Going back to school provided the technical foundation for our new career, but there are intangible skills that we carried forward with us from the lives we left behind. We will share a few examples using stories and scenarios from the past and linking them to our challenges in the present. The intent of this discussion is to encourage both career changers and those hiring or managing them to consider and leverage the unique skills and perspectives they bring to the table from their previous experiences.

LS-351 : A Framework for Risk-Based Oversight for Fully Outsourced Clinical Studies
Anbu Damodaran, Alexion Pharmaceuticals
Neha Srivastava, Fortrea

In recent years, the pharmaceutical industry has seen a significant rise in outsourcing clinical trials to Contract Research Organizations (CROs). Estimates suggest that 75% to 80% of the biopharmaceutical industry's R&D costs are outsourced, highlighting the prevalence of this practice. These trials, spanning various therapeutic areas, are entrusted to multiple CRO partners. The oversight of these diverse CROs, with their varied practices, and the siloed approach by Therapeutic Area (TA) functional leads from the sponsor, can lead to inconsistent deliverables. This created a dire need for more consistent and effective CRO oversight. In this paper, we describe a new approach to CRO oversight that was implemented by a pharmaceutical company. The new approach involved the creation of a new group that was responsible for overseeing all outsourced studies, regardless of therapeutic area. This new group created a framework for risk-based oversight plan. The paper highlights various key steps of this framework from Partnership Initiation to Study Closure. The paper provides insights on different methods, trackers and tools used to aid the oversight of CRO work. It also provides references for Management mental models for outsourcing success. Overall, this framework stands as a robust solution to enhance the quality and uniformity of CRO oversight in the evolving landscape of clinical research.

LS-357 : Harmony in Motion: Nurturing Work-Life Balance for Sustainable Well-being
Purvi Kalra, Ephicacy
Varsha Patil, Ephicacy lifescience

In today's fast-paced and interconnected world, achieving a harmonious work-life balance has become a critical pursuit for individuals, organizations, and society at large. This paper delves into the multifaceted dimensions of work-life balance, exploring the intricate interplay between professional responsibilities and personal well-being. The paper begins by examining the evolving nature of work in the digital age and its impact on individual's lives. It explores the psychological and physiological implications of prolonged work hours, emphasizing the importance of setting clear boundaries to prevent burnout and enhance overall mental health. Furthermore, this paper investigates organizational practices that promote work-life balance, encompassing flexible work arrangements, supportive corporate cultures, and the role of leadership in fostering a healthy work environment. Key attention is given to the role of technology in both enabling and challenging work-life balance. The paper scrutinizes the influence of remote work, digital connectivity, and the "always-on" culture on individual's ability to detach from work during non-working hours. Drawing on global perspectives, the paper analyses cultural variations in attitudes toward work-life balance and considers how societal expectations and norms influence individual's choices and experiences. The impact of gender roles and caregiver responsibilities on work-life balance is a focal point, with insights into promoting inclusivity and diversity within work environments. It advocates to create environments that prioritize well-being, thereby fostering a more sustainable and fulfilling approach to work and life.

LS-371 : Go Get - Em: Manager's Guide to Make a Winning Business Proposal for Technology Solutions
Dilip Raghunathan, Regeneron

This paper empowers a leader to put together a winning business case to secure resources and funding for delivering technology solutions that enable, enhance, or transform business capabilities. The paper will present a foundational framework that encapsulates the concept of business capabilities and how a leader can represent the priority and maturity of the capability in its current state, and use it as a baseline to set aspirational targets for a future state. The paper will also present a framework by which the leader can capture quantitative and qualitative aspects of the target state's business value, and use that as a driver along with the case for urgency and risk of inaction to unequivocally make a wining argument for why the organization should direct its time and resources to the leader's call for action. The paper will finally cover how to develop a plan of action and approach to bringing the technology solution into fruition to improve the business capability. With a blend of Marvel magic to infuse a dash of humor and invoke the child within us, combined with practical advice, we will make the theoretical frameworks come alive with a specific example of a leader wishing to bring about a Statistical Computing Environment (SCE) to their organization. The concepts covered can be extrapolated to support the journey of delivering any technology solution and empower the leader to be successful in making a winning proposal.

LS-383 : Ongoing Trends and Strategies to Fine-tune the CRO/Sponsor Partnership - Perspectives from Statistical Programming
Mathura Ramanathan, IQVIA

Sponsor and Contract Research Organization (CRO) partnerships are becoming more frequent, and critical in the management and handling of data from clinical trials. In this paper, we describe various types of these interactions, factors that shape and govern these interactions, and pros and cons of each type. Additionally, we highlight how multiple factors on both sides determine the magnitude, length, and scope of the relationships. Outsourcing relationships range from short-term transactions to fully integrated partnerships. While partnerships occur in a variety of clinical trial processes, here we focus on the business processes surrounding the needs of clinical and statistical programming. In this paper, we propose effective strategies for optimizing conditions to foster a fully integrated long-term Sponsor-CRO partnership that benefits both sides, and enables high quality clinical submissions.

LS-410 : Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
Josh Horstman, Nested Loop Consulting
Richann Watson, DataRich Consulting

While many statisticians and programmers are content in a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. In this paper, two seasoned consultants share their experiences going independent. Topics include the advantages and disadvantages of independent consulting, getting started, finding work, operating your business, and what it takes to succeed. Whether you're thinking of declaring your own independence or just interested in hearing stories from the trenches, you're sure to gain a new perspective on this exciting adventure.

LS-443 : Data Harmony Revolution: Rocking Trials with Clinical Data Literacy
Melanie Hullings, TrialSpark
Andrew Burd, Formation Bio
Helena Belloff, Formation Bio
Emily Murphy, Formation Bio

Clinical Data Analytics and Programming (CDAP) is centered around the collection, monitoring, reporting, and analysis of clinical trial data. Our team's objective is to improve the efficiency of clinical trials by producing high quality, reliable, and statistically robust clinical datasets. In order to accomplish this, analysts on the team must have an understanding of all aspects of technical frameworks and clinical trial operations, including medical and safety monitoring. At the same time, clinical trials require a large, multi-disciplinary team across many different highly-specialized clinical, operational, and technical roles. Because of the varied backgrounds, it is hard for everyone to understand what the other teams are doing, interdependencies, and downstream effects of decisions. In an effort to bridge any gaps in understanding across the interdisciplinary teams, CDAP ran a 12-week internal internship program open to all employees. The purpose of this program was to teach the regulatory and operational basics of clinical trial data collection and analysis, empower people to work with advanced technical tools, and develop programming skills they could use to augment their own workflows. The result of this program, which culminated in a final project presentation showcasing R programming and data analysis skills, was a workforce with improved clinical and coding skills, enhanced transparency and empathy between teams, and a foundation for internal and external training programs.

Metadata Management

MM-225 : Variable Subset Codelist
Kang Xie, AbbVie

A codelist defines the controlled terminology of a variable within a dataset. The controlled terminology provides the values required for submission to FDA, PMDA, or other regulatory agencies. For the standard codelist in an ADaM dataset, we use the codelist in NCI/CDISC SDTM Terminology and ADaM Terminology. When the controlled terminology of a standard codelist contains more values than a variable possibly has, we can assign a subset codelist of this standard codelist to this variable. Besides, the controlled terminology of the codelist of a parameter-related variable is related to a parameter or a set of parameters within this variable, we can assign subset codelists to this variable by parameters. This paper describes the approaches to assigning subset codelists to these two types of variables, aiming to refine their allowable values and acquiring extra information via the names of the subset codelists. Moreover, this paper provides my comments about the codelists in the example included in Define-XML v2.1.6 Updated Release Package in CDISC Define-XML standards.

MM-226 : Methodology for Automating TOC Extraction from Word Documents to Excel
Jeetender Chauhan, Merck & Co., Inc.
Madhusudhan Ginnaram, Merck & Co., Inc., Rahway, NJ, USA
Sarad Nepal, Merck & Co., Inc., Rahway, NJ, USA
Jaime Yan, Merck & Co., Inc., Rahway, NJ, USA

In the complex realm of large-scale clinical study management, efficient data handling and traceability are paramount. This paper introduces a beneficial tool designed within Excel, which utilizes Visual Basic Applications (VBA) to revolutionize this process. The tool's primary functionality is automatically extracting table titles from mock document templates and seamlessly populating them into an Excel file. This process is streamlined for convenience: a simple click on the 'Prepare Extract' button enables the tool to prompt the user to select a target mock document, autonomously transferring the relevant content into Excel. Furthermore, the tool incorporates a comprehensive trace system, crucial for monitoring clinical studies' progression and validation levels, thereby significantly reducing manual data entry and potential errors. This paper will focus on this innovative tool's technicalities, applications, and benefits in managing complex clinical data. Automating title extraction and significantly enhancing traceability advances clinical study management, offering a more structured, efficient, and error-reducing approach for handling extensive reports in large studies.

MM-240 : Managing a Single Set of SDTM and ADaM Specifications across All Your Phase 1 Trials
Avani Kaja, Pfizer

Submitting data in CDISC standards has become mandatory in the drug development industry. The specifications for such standardized data structures and controlled terminology can be managed in metadata collections called data definition tables (DDTs). While it is common practice for many companies to use a single consolidated master DDT to manage all studies within a single product, this paper will discuss how to standardize and implement an MDDT across multiple products in oncology early-stage development where studies from different products and indications tends to share common design and many analysis endpoints. The paper emphasizes implementation, process, and best practices to accommodate different products into one template. It will discuss the process of setting up and maintaining the specifications across various products. In addition, the paper will illustrate effective ways of communicating across different product teams to align specifications within the single MDDT. Lastly, the paper will highlight the process of adapting these specifications when products move to late-stage development. Implementing this process in early-stage development has yielded significant reduction in set-up time for SDTM and ADaM specifications, higher compliance to departmental standards, and better utilization of departmental utilities and macros - " all of these contribute to a much more efficient and streamlined support for study conduct.

MM-245 : Relax with Pinnacle 21's RESTful API
Trevor Mankus, Pinnacle 21

Harness the power of Pinnacle 21 Enterprise's REST API and relax! Programmatically extract metadata from within your validation environment, from study level specifications to broader organizational level metadata like as Standards and Terminologies. The metadata retrieved includes datasets, variables, value level, codelists, terms, methods, comments, and even Analysis Results Metadata (ARM) for ADaM studies. These objects can then be incorporated into the programming process, such as setting up attributes for variables, assigning dataset labels, establishing key variables, and pairing coded and decoded values. This paper will detail how to access the Pinnacle 21 Enterprise REST API and discuss best practices on implementation within the data standardization process. It will also cover the various options for export file formats as well as provide examples for practical use of metadata to ensure automating metadata retrieval is a breeze.

MM-267 : A Practical Approach to Automating SDTM Using a Metadata-Driven Method That Leverages CRF Specifications and SDTM Standards
Xiangchen Cui, Crisprtx Therapeutics
Min Chen, CRISPR Therapeutics
Jessie Wang, CRISPR Therapeutics

Automated SDTM generation has several benefits, including efficiency, accuracy, compliance with regulatory requirements, and the speeding up of the data analysis process. However, due to the dissimilarity and varying complexity of different CRFs, SDTM domains, and eSource systems among different studies, development of a tool to automate SDTM has been a challenging task for sponsors, CROs, and EDC service providers. We propose a new approach in automatic generation of SAS® code for SDTM. A SAS-based macro is developed based on CRF specifications from an EDC database and SDTM standards. Our approach is user-friendly with high transparency, easily scalable to multiple studies, and especially useful for relatively smaller sponsors and CROs, for there is no requirement to standardize CRFs and raw dataset variables' attributes (which is the best practice but can be too work-intensive) and no required expertise in other computer languages.

MM-358 : Optimizing Clinical Data Processes: Harnessing the Power of Metadata Repository (MDR) for Innovative Study Design (ISD) and Integrated Summary of Safety (ISS) / Efficacy (ISE)
Lakshmi Mantha, Ephicacy Life Sciences
Purvi Kalra, Ephicacy
Arunateja Gottapu, Mr

The implementation of a Metadata Repository (MDR) system in clinical data processes has proven instrumental in significantly reducing costs and development timelines for organizations. MDR adoption facilitates reusability, process automation, and end-to-end data traceability, leading to enhanced data handling capabilities and process standardization. This paper focuses on leveraging MDR for Innovative Study Design (ISD) and Integrated Summary of Safety (ISS) / Integrated Summary of Efficacy (ISE), presenting use cases that demonstrate its impact on various crucial aspects. 1. Compare metadata across dictionaries and standards. 2. Performing a Case Report Form (CRF) compliance check and cloning using a Metadata Repository (MDR) 3. Comparing one study with another using a Metadata Repository (MDR) 4. Integration of MDR with Statistical computing environment (SCE) for a seamless automation The implementation of MDR emerges as a cornerstone in achieving excellence in data handling capabilities within the clinical research domain.

MM-447 : Automating third party data transfer through digitized Electronic DTA Management
Vandita Tripathi, Ms
Manas Saha, Ms

CDM is a pivotal process in clinical trial, and not all data is directly captured into case report forms (CRFs), a large portion is also collected from external sources like third-party vendors and labs. This type of data is called "non-CRF data". Maintaining Data integrity and quality in the non-CRF data has a critical influence on data management and study success. To govern the quality and integrity of the non-CRF data exchange, Data Transfer Agreements (DTA) are signed between sponsor and vendor organizations. DTA defines the structure, timelines, and data definitions which enables transfer of non-CRF data from vendor to the clinical database. Often the DTA authoring, and review process takes months which is a significant challenge in the industry. An online, collaborative, highly automated, and real time DTA authoring solution along with integrated multi-step DTA review and approval system can significantly benefit the non-CRF data management landscape. The system can utilize predefined Master DTA templates, Data models and Code lists to automatically draft the DTA document. The auto generated DTA document can be viewed and edited in real time by multiple editors. The reviewers can quickly track the changes and accept / reject them with comments. Once the edits are reviewed and finalized, an approver can approve the DTA along with e-Signature. This can cut down the timelines from months to days by automating DTA drafting and utilizing real time collaboration. This could potentially turn out to be a paradigm shift for third party data management.

Real World Evidence and Big Data

RW-125 : Reconstruction of Individual Patient Data (IPD) from Published Kaplan-Meier Curves Using Guyot's Algorithm: Step-by-Step Programming in R
Ajay Gupta, Daiichi Sankyo
Natalie Dennis, Daiichi Sankyo

Secondary analysis may require the use of reconstructed patient-level data from published Kaplan-Meier (KM) curves to support a number of different objectives, including indirect treatment comparisons within the context of economic evaluations. Guyot (2012) developed an algorithm that reconstructs individual patient data (IPD) for time-to-event endpoints using published KM curves. This presentation will provide step-by-step instructions and a use case for executing the Guyot (2012) algorithm to reconstruct IPD from published KM curves in R.

RW-227 : A SAS® Macro Approach: Defining Line of Therapy Using Real-World Data in Oncology
Yu Feng, Merck

In oncology, Line of Therapy refers to the specific phase or sequence of treatment that a patient undergoes in the management of their cancer. Cancer treatment is often organized into sequential lines of therapy, each representing a distinct phase or set of interventions. However, most healthcare databases lack explicit information on treatment line of therapy. This paper introduces an innovative SAS® macro designed to depict patient treatment regimens in oncology using a defined algorithm. The algorithm initially defines the treatment regimen within a specified timeframe of the index date. Stopping drugs from the combination regimen does not advance the treatment line, but adding a new drug will start the next line of therapy. If the duration between two cycles, lacking any chemotherapy or biologic regimen, exceeds the allowable gap days, a new line of therapy is instituted. The proposed SAS® macro integrates an embedded macro to create types and flags, distinguishing various scenarios. These indicators are then utilized to subset fully defined and non-completed defined data. A loop is employed to process the remaining data, ultimately combining each defined line to capture its entire therapeutic pathway. This macro provides a comprehensive tool for analyzing real-world oncology data. The paper showcases the macro's methodology, applications, and advantages, emphasizing its potential to refine treatment regimens and improve our understanding of patient journeys in cancer care.

RW-260 : Fidelity Assessment of Real-World Data as An External Control Arm
Karthik Venkataraman, Algorics
Rajesh Karthikeyan, Algorics

Real-world data is being extensively used across different areas of clinical research including pat personalized medicine, patient recruitment, treatment effectiveness, and post-approval safety evaluation. Furthermore, the FDA's commitment to considering real-world evidence in regulatory decision-making emphasizes the emerging role of real-world data in clinical research. An important though niche use of real-world data is to serve as a control arm (CA) in clinical trials, which can be a reliable alternative to randomized control trials. However, real-world data presents a unique set of challenges in its use as an external control arm due to the 3 V's - " Volume, Velocity, and Veracity. Hence despite a multitude of advantages, it is evident significant planning and efforts are needed to leverage the power of real-world data as a control arm. In this presentation, we will be exploring a scientific framework to conduct a fidelity assessment for the use of real-world data in planning, execution, and reporting as a control arm.

RW-275 : Win a PS5! How to Run and Compare Propensity Score Matching Performance Across Multiple Algorithms in Five Minutes or Less
Catherine Briggs, SAS
Sherrine Eid, SAS Institute
Samiul Haque, SAS Institute
Robert Collins, SAS Institute

With the increased use of real-world data (RWD) in clinical and healthcare settings, having a comprehensive comparison of propensity score matching (PSM) algorithms available to researchers is vital. In this paper, we will discuss different avenues for PSM and show the strengths and weaknesses of each using simulated data. We will cover aspects of performance of the algorithms from statistical measures to computing resources. The final part of the paper will demonstrate the effect of the matching algorithms on estimating the causal effect of treatment on an outcome. This paper will use matching algorithms from SAS, R, and Python and show their results through a SAS Viya Visual Analytics Dashboard. We will assume some knowledge of statistical modeling and RWD and welcome anyone interested in our work.

RW-390 : Unraveling the Layers within Neural Networks: Designing Artificial and Convolutional Neural Networks for Classification and Regression Using Python's Keras & TensorFlow
Ryan Lafler, Premier Analytics Consulting, LLC
Anna Wade, Premier Analytics Consulting, LLC

Capable of accepting and mapping complex relationships hidden within structured and unstructured data, Neural Networks are a subset of Deep Learning that involves layers of neurons interacting with, transforming, and passing data through successive layers to develop highly flexible and robust predictive models. Capable of regression, classification, and generating entirely new data from existing sources, Neural Networks are adept at untangling complex, real-world problems and powering recent breakthroughs in industries ranging from finance, healthcare, the life sciences, climatology, video remastering, and business analytics for decision-making and modeling purposes. This Presentation offers users an intuitive, example-oriented guide to developing, training, and evaluating Artificial Neural Network (ANN) Convolutional Neural Network (CNN) architectures in Python for regression and classification.

RW-421 : Applications of Machine Learning and Artificial Intelligence in Real World Data in Personalized Medicine for Non-Small Cell Lung Cancer Patients
Sherrine Eid, SAS Institute
Robert Collins, SAS Institute
Samiul Haque, SAS Institute

Introduction: Lung cancer is the most commonly diagnosed cancer and leading cause of cancer death in 2018. Most cases are Non-Small Cell Lung Cancer (NSCLC). Machine learning methods (ML) and Real World Data (RWD) make personalized medicine a viable possibility to save patients' lives. We explored patient journeys and novel opportunities to save lives. Chronic Obstructive Pulmonary Disease (COPD) commonly persists during cancer treatment and requires interventions that could jeopardize the effectiveness of the treatment and their prognosis. Methods: This study included 1.2 Million NSCLC (ICD-10 C34.*) patients from 2019-2020 using Symphony Health Claims (Symphony Health Solutions, ICON Plc). Patient journeys using diagnoses were established with Path Analysis. Likelihood of metastatic cancer (MC) was determined using ML and artificial intelligence (AI) methods and compared using KS (Youden) and ROC using SAS Viya 4.0 (SAS Institute, Inc.) Results: Over 39% of patients were diagnosed with MC. The factors most related to MC were Fluid and Electrolyte Disorders, Weight Loss, Coagulopathy and Liver Disease, respectively. Furthermore, path analyses illustrated the impact of COPD on these patients and their care. Gradient boosting was the best fit model among the ML methods (KS (Youden)=0.286) followed by Forest, Logistic Regression and Bayesian Network (0.279, 0.277, and 0.274, respectively). ML demonstrated greater accuracy in assessing the likelihood of MC. Conclusion: In conclusion, NSCLC patients should be assessed for comorbidities to better anticipate secondary treatment needs that may interfere with their cancer treatment. This approach to personalized medicine could contribute to better prognoses for these patients.

RW-450 : Towards understanding Neurological manifestations of Lyme disease through a machine learning approach with patient registry data
Lorraine Johnson,
Lara Kassab, UCLA
Jingyi Liu, Colorado College
Deanna Needell, UCLA
Mira Shapiro, Analytic Designers LLC

The Centers for Disease Control and Prevention (CDC) estimates that 476,000 new cases of Lyme disease occur annually. Many patients remain ill after treatment, and these patients are often referred to as having persistent or chronic Lyme disease (PLD/CLD), and it is estimated that nearly 2 million people in the US have PLD/CLD. PLD/CLD patients suffer from significantly impaired quality of life, utilize healthcare services more frequently and have greater limitations on their ability to work than the general population and patients with other chronic diseases. Neurological symptoms are common in PLD/CLD and have been linked to functional and structural changes in the brain. Very little however, is known about the neurological manifestations of Lyme disease - " contributing factors, diagnostic tools, or best treatments. Here, we analyze the MyLymeData patient registry, a project of, that contains over 9,000 patients with Lyme disease. Through statistical analysis and state of the art machine learning (ML) methods, we identify common patterns across patients with neurological symptoms. We compare the statistics across subgroups of neurological symptoms both statistically and through ML topic modeling based on non-negative matrix factorization.

RW-453 : Interfacing with Large-scale Clinical Trials Data: The Database for Aggregate Analysis of
Joshua Cook, University of West Florida (UWF)
Achraf Cohen, University of West Florida (UWF)

In the realm of healthcare research, leveraging real-world and big data is paramount, especially when utilizing historical clinical trials data to construct probabilistic and predictive models for future research. This paper illustrates an efficient method for mining and analyzing such data from the Aggregate Analysis of (AACT) database using tools like R, PostgreSQL, the tidyverse suite, and Quarto. The AACT database, a publicly available repository containing up-to-date data from registered clinical studies at, is instrumental for research in the Real-World Evidence and Big Data domain. Employing R, a comprehensive statistical language, in conjunction with PostgreSQL- "a sophisticated open-source relational database system- "enables precise data querying. The tidyverse suite, a collection of R packages, is adept at data manipulation and graphic presentation, while Quarto serves as a dynamic reporting tool. This methodology not only simplifies the data extraction process but also enhances the analysis of extensive clinical trial details, such as study protocols, enrollment figures, completion times, and outcome measures. The end product is a robust platform that facilitates the generation of statistical models capable of predicting trial outcomes as well as informing future trials, thus refining the efficiency of healthcare research. In essence, the integration of R, PostgreSQL, and tidyverse with the AACT database provides a powerful toolkit for researchers. This combination offers a sophisticated means to navigate and utilize historical trial data, contributing significantly to the advancement of clinical research methodologies.

Solution Development

SD-114 : SAS ® Program Efficiency for Beginners
Bruce Gilsen, Federal Reserve Board of Governors

This paper presents simple efficiency techniques that can benefit SAS users on all platforms at all levels of experience. 1. Create a SAS data set by reading long records from a flat file with an INPUT statement. Keep selected records based on the values of only a few incoming variables. 2. Create a new SAS data set by reading an existing SAS data set with a SET statement. Keep selected observations based on the values of only a few incoming variables. 3. Subset a SAS data set for use in a SAS procedure. 4. In IF, WHERE, DO WHILE, or DO UNTIL statements, use OR operators or an IN operator to test if at least one of multiple conditions is true or use AND operators to test if multiple conditions are all true. 5. Select observations from a SAS data set with a WHERE statement. 6. In a DATA step, read a SAS data set with many variables to create a new SAS data set. Only a few of the variables are needed in the DATA step or the new SAS data set. 7. Create a SAS data set containing all observations from two existing SAS data sets. The variables in the two data sets have the same length and type. 8. Process a SAS data set in a DATA step when no output SAS data set is needed. 9. Execute a SAS DATA step in which the denominator of a division operation could be zero.

SD-141 : "Prompt it", not "Google it" : Prompt Engineering for Statistical Programmers and Biostatisticians
Kevin Lee, Karuna Therapeutics

Since its release, ChatGPT has rapidly gained popularity, reaching 100 million users within 2 months. Even a new concept has emerged : "Prompt it" is now the new "Google it". Research shows ChatGPT users complete projects 25% faster. The paper is written for Statistical Programmers and Biostatisticians who want to improve their productivity and efficiency by using ChatGPT prompts better. The paper explores the pivotal role of prompts in enhancing the performance and versatility of ChatGPT or other Large Language Model. The paper shows how Statistical Programmers and Biostatistician utilize ChatGPT's capabilities and benefits such as the content development (e.g., emails, images), search for the information, Programming assistance in R, SAS and Python, Result Interpretation and many more. The paper also elucidates the distinctive advantages of employing prompts over traditional search methods. It emphasizes the unique characteristics of prompt engineering in ChatGPT. Various techniques, such as zero-shot learning, few-shot learning, reflection, chain of thought, and tree of thought, are dissected to illustrate the nuanced ways in which prompts can be engineered to optimize outcomes. The comprehensive exploration also offers insights into how to prompt better by adding constraints, incorporating more contexts, setting roles, coaching with feedback, probing further, and introducing step-by-step instructions to ChatGPT. The paper discusses on ChatGPT's functionality in modifying and resubmitting the prompt, copying the answer, regenerating the answer, and continuing the previous prompt. The paper highlights how Stat programmers and Biostatisticians use and lead the transformative impact of prompts to be more productive and effective.

SD-165 : Benefits, Challenges, and Opportunities with Open-Source Software Integration
Kirk Paul Lafler, sasNerd
Ryan Lafler, Premier Analytics Consulting, LLC
Joshua Cook, University of West Florida (UWF)
Stephen Sloan, Dawson D R

The open-source world is alive, well and growing in popularity. This paper highlights the many benefits found with open source software (OSS) including its flexibility, agility, talent attraction, and the collaborative power of community; the trends show that open-source is ubiquitous penetrating many critical technologies we depend on, where more technology companies recognize the importance of the open-source community leading to more initiatives and sponsorships that support open-source creators; the challenges of open source including compatibility vulnerability issues, security limitations, intellectual property issues, warranty issues, and inconsistent developer practices; and the opportunities coming out of the open source community including cloud architecture, open standards, and the collaborative nature of community.

SD-166 : The 5 CATs in the Hat - Sleek Concatenation String Functions
Kirk Paul Lafler, sasNerd

SAS functions are an essential component of the SAS Base software. Representing a variety of built-in and callable routines, functions serve as the "work horses" in the SAS software providing users with "ready-to-use" tools designed to ease the burden of writing and testing often lengthy and complex code for a variety of programming tasks. The advantage of using SAS functions is evident by their relative ease of use, and their ability to provide a more efficient, robust, and scalable approach to simplifying a process or programming task. SAS functions span several functional categories, including character, numeric, character string matching, data concatenation, truncation, data transformation, search, date and time, arithmetic and trigonometric, hyperbolic, state and zip code, macro, random number, statistical and probability, financial, SAS file I/O, external files, external routines, sort, to name a few. This paper highlights the old, alternate, and new method of concatenating strings and/or variables together.

SD-179 : Developing Web Apps in SAS Visual Analytics
Jim Box, SAS Institute
Samiul Haque, SAS Institute

SAS Viya provides capabilities to develop web applications that let you use HTML pages to provide inputs to programs that use SAS, R, and/or Python to create, display, and share analysis results across your organization. The GUI uses standard HTML programming to collect the inputs and the system leverages the existing scalability and security of your cloud environment.

SD-198 : AutoSDTM Design and Implementation With SAS Macros
Chengxin Li, AutoCheng Clinical Data Services LLC

CDISC standards have been widely deployed in the pharmaceutical industry for the past decade. Main effort has focused around complying to CDISC standards. While complying with CDISC standards, the focus should shift towards developing a reusable automated solution. This paper introduces a universal autoSDTM solution implemented with SAS macros. Although the metadata of raw data varies very much across companies, the metadata patterns are very limited and the metadata of targeted SDTM are very rigid defined in SDTMIG. This provides the feasibility of SDTM mapping automated. Design goals: 1. No dependence on data management settings thus a universal solution; 2. Besides current SDTMIG 3.3 compliance, output with submission ready; 3. Process transparent to facilitate review; 4. Following SDTM programmer's practices thus easy to learn and with low transition cost; 5. User friendly; 6. Most importantly, high quality and efficiency. Implementations: 1. One SAS macro per domain; 2. Macro parameters aligned with the SDTM variables, thus parameterizing and getting; 3. Input conventions and checking for dataset name, variable name and format pre-coded with SAS log, warning and error for user friendly; 4. Outputs including SDTM dataset, LOG, mapping specifications, and executable plain SAS codes dynamically generated per domain; 5. Control terminology portal enabled; 6. For special cases, data preprocessing supported. It's verified that the above autoSDTM solution is practical and of high quality and efficiency. Not including trial design domains, total 50 SDTM domain macros have been developed. The tool has been validated in multiple studies and applied to client productions.

SD-200 : Bridging AI and Clinical Research: A New Era of Data Management with ChatGPT
Illia Skliar, Intego Clinical

The integration of ChatGPT, an advanced language model, into the realm of clinical data management marks a significant evolution in the methodology of statistical programming within clinical trials. In this session we will delve into the fundamental aspects of ChatGPT, explaining its mechanics and potential as a transformative tool in generation sample data. Specifically, we'll explore the way for ChatGPT to automatically create SAS datasets containing raw data using study protocols, statistical analysis plans, and Case Report Forms (CRFs). Moving on, the paper will describe an innovative approach in automating the production of datasets with raw data for training purposes, tailored for specific needs of clinical trials. The discussion emphasizes how ChatGPT can improve data readiness and facilitate a more efficient trial initiation process. Finally, we will analyze the secure connection restrictions and regulatory constraints, including FDA standards, that regulate the use of AI and machine learning tools in clinical research, emphasizing the need for a balanced and compliant approach in the deployment of such innovative technologies. Closing the session, we will approach the crucial question: how effective is ChatGPT in enhancing SAS data management for early-phase clinical trials, offering a straightforward assessment of its real-world impact.

SD-211 : Utility Macros for Data Exploration of Clinical Libraries
Matt Maloney, Bristol Myers Squibb

Programmers are always looking for ways to simplify their work. A combination of laziness and creativity allows SAS users to leverage the SAS macro language to create their own tools to perform monotonous parts of data exploration. Clinical libraries often contain many datasets that need to be iterated over to perform certain data exploration tasks. This can be largely accomplished through the SAS macro language. This paper presents two of these utility macros. One is a macro that searches for any character match in SAS dataset names, dataset labels, variable names, variable labels, and variable values to quickly find information from a library of SAS datasets. The other is a macro that compares the datasets in two different SAS libraries for any discrepancies in datasets and performs simple value comparison. It differs from other similar macros in its intent of use being to compare libraries where changes are generally not expected, displaying a table of likewise differences, and then lastly running a second pass of proc compare only for datasets where value differences are found. The macros share many similarities that highlight a certain style of SAS macro writing encompassing lightweight, self-documenting, and non-invasive coding making them simple to understand and easily approachable.

SD-217 : Semi-Automated and Modularized Approach to Generate Tables for Clinical Study - Categorical Data Report
William Wei, Merck & Co, Inc.
Shunbing Zhao, Merck & Co.

A table, listing and figure (TLF) package, incorporated in a clinical study report (CSR), serves to encapsulate the data from a clinical trial. Tables constitute the majority of the TLF package. The CSR tables, which primarily summarize safety and efficacy data, can be categorized into two major modules 1) categorical (ex: gender, sex, race, etc.) and 2) continuous (ex: age, time to event, etc.). This paper explores and discusses a design of the first major module - categorical, which could generate multiple output formats to accommodate the report requirements. The design of this module is independent, flexible, portable, informative, and parameter-driven, and can be exported or linked to different reports. This module can be used independently to produce a categorical table report in different formats. It also can produce different output formats and the reports can be combined for a more complex table. For instance, a table may consist of categorical analysis blocks, continuous blocks, and other blocks, such as CI, P-value, etc. This approach provides users with substantial flexibility to construct any table they have in mind and present it as desired. This approach significantly reduces the time and effort required for programming CSR tables.

SD-239 : Automation of Report Generation Beyond Macro
Hong Qi, Merck & Co., Inc.
Mary Varughese, Merck & Co., Inc.

SAS macros have been widely used to improve efficiency and standardization of data analysis and reporting in medical product development and approval. These macros are designed for repetitive tasks and streamlining process of report generation and are created at different levels - global, therapeutic area- and protocol-specific - to meet the requirements of various reporting needs. However, relying solely on macros may not be sufficient, particularly in late-stage oncology clinical studies where numerous data analysis reports are generated during the course of a study. Within each deliverable, reports often exhibits common characteristics such as tumor type, biomarker, subgroups, and more. Traditionally, a separate macro-call program is created for each report, requiring manual adjustments to address different requests. To overcome these challenges and enhance automation, an approach has been developed using standardized macro-call programs in conjunction with a Level-II SAS program. The macro-call programs are designed to be applicable to multiple deliverables without the need for manual revision and the Level-II program specifies common macro parameters for each deliverable, such as indication, population, biomarker, and other requests. This approach has proven to be highly effective and efficient. By leveraging existing macros and taking automation process steps beyond macros, it reduces programming time, saves resources, and minimizes review efforts while enhancing the overall quality of the generated reports. In addition, there is potential for generalizing this approach to encompass multiple emerging programming languages in the domain of clinical trial report generation.

SD-241 : Dynamic patient-centric reporting using R Markdown
Umayal Annamalai, Algorics
Srihari Hanumantha, Algorics
Madhu Annamalai, Algorics

In the era of data-driven healthcare, the need for efficient, reproducible, and customizable patient documentation is paramount. Our methodology leverages the versatility of R Markdown to seamlessly integrate diverse clinical data sources, statistical analyses, and interactive visualizations into comprehensive patient profiles. This approach not only enhances the speed and accuracy of profile generation but also facilitates standardization and collaboration amongst cross-functional clinical teams. In this presentation, we bring together our real-life use cases and insights to showcase the potential of R Markdown as a valuable tool in transforming patient data into meaningful profiles, ultimately contributing to improved clinical decision-making and patient outcomes. From this session, the audience will be taking away how to effectively leverage and implement R Markdown for powerful patient profile generation and reporting within their organizations.

SD-243 : Unravelling the SDTM Automation Process through the Utilization of SDTM Transformation Template
Lakshmi Mantha, Ephicacy Life Sciences
Inbasakaran Ramesh, Ephicacy Life Sciences

SDTM datasets play a pivotal role in supporting the submission of various study-related activities, including annual DSUR/IB, PSUR, and CSR deliverables. The automation of the SDTM process is gaining momentum, aligning with established CDISC guidelines. However, the complexity arises due to variations in data collection standards across different companies. The efficient creation of SDTM datasets can significantly reduce the workload by saving numerous hours. In this paper, we will provide a comprehensive explanation of the SDTM transformation template, serving as a crucial input to the SDTM automation engine. This engine is designed to automate the SDTM program generation process, aiming to achieve at least a 70 percent automation rate. The detailed exploration of the SDTM transformation template and its integration into the automation engine will shed light on how this innovative approach significantly enhances the efficiency of SDTM program creation. The SDTM transformation template acts as a blueprint, delineating the steps and procedures required for a systematic and efficient transition. It serves as a roadmap for mapping variables, defining relationships, and establishing the necessary transformations to ensure data conforms to SDTM standards.

SD-255 : Define-XML Conversion: A General Approach on Content Extraction Using Python
Danfeng Fu, MSD
Dickson Wanjau, Merck & Co., Inc.
Ben Gao, MSD

Define-XML is a critical component of clinical trial data packages submitted to regulatory agencies as part of a new drug/biologics application. It provides a standardized way to exchange the metadata for SDTM and ADaM datasets, which are used for analysis and reporting. Given the rich information embedded into Define-XML, there is a need to extract the desired contents and convert them into SAS datasets in order to be able to automate consistency checks against study datasets and documents, e.g., ADRG. However, the extraction is challenging owing to XML language's natural complexity with nested elements and namespaces. SAS is a natural choice due to its widespread use in the pharmaceutical industry. However, it is a cumbersome process since one needs to create XML map files for parsing the XML structure. Encoding would be another issue to tackle, especially when Define-XML contains non-ASCII characters. In this paper, we will introduce an approach using Python, leveraging its extensive packages to extract contents from all types of XML schemas, not limited to the CDISC ODM schema. This approach doesn't require creation of map files and can better handle non-ASCII characters. It also offers a user-friendly method for presenting a structural overview of a selected XML element (sub-elements, contents, and attributes) and flexibility to extract desired contents without relying on a pre-defined map file to create tabulated datasets or stylesheets to visualize the XML contents. This facilitates the automation of cross-document consistency checks and potentially automates other downstream processes, ultimately enhancing submission quality and compliance.

SD-262 : Integration of SAS GRID environment and SF-36 Health Survey scoring API with SAS Packages
Bart Jablonski, yabwon

The SF-36v2 Health Survey is a 36 questions survey which allows to measure functional health and well-being from the patient's point of view. As a health survey, SF-36v2 can be used across age, disease and treatment groups, in contrast to disease-specific health surveys, which focuses on a particular disease or condition. Raw SF-36v2 survey data are subject to scoring. Such scoring is provided by tools delivered, for instance, by QualityMetric Inc. The QM API provides all the functionality needed to score and save survey data in the PRO Insight Smart Measurement System. Once a survey is scored, the API returns scores and interpretations based upon processed data. In Takeda Pharmaceuticals a SAS Package named "QMsf36scoring" was developed to facilitate scoring of SF-36v2 data, stored in SAS data sets, by the QM API. The "QMsf36scoring" package is a generic solution which can be executed on any SAS environment which is able connect to the QM API tool. Thanks to a SAS Package approach interaction with the API is easy and straightforward, which allows to embed the scoring process very natural into the SDTM or ADaM development process. During the presentation details of the design and the implementation of the package will be presented.

SD-266 : A Tool for Automated Comparison of Core Variables Across ADaM Specifications Files
Amy Zhang, Merck & Co.
Huei-Ling Chen, Merck & Co.

A specifications file serves as fundamental guidance for creating CDISC Analysis Dataset Model (ADaM) datasets. Dataset variables and their associated attributes in the specifications file have predefined standards. In particular, the core attribute is essential; per CDISC IG, the Core column defines whether a variable is required, conditionally required, or permissible. When preparing the ADaM specifications for a study, the programmer frequently encounters the task of ensuring the core variable categorization follows ADaMIG standards, company standards, or other protocols of the same compound or indication. To ease this process, having a tool to quickly compare the list of core variables across standards or studies can help facilitate ADaM dataset preparation. Storing these specifications in an Excel file is a common approach. Some existing software tools provide spreadsheet comparisons, but the findings are often overwhelming to digest. This simple SAS macro provides programmers with a quick glimpse into differences in core variable categorizations across multiple Excel files based on user-specified ADaM datasets and selection criteria. The results are summarized in an Excel file format, with each ADaM dataset as its own spreadsheet presenting the list of core variables across standards or studies.

SD-308 : AI-Enhanced Clinical Data Interaction: Enhancing Data Management, Programming, and Validation Using LangChain and Pandas in Python
Arunateja Gottapu, Mr

This paper presents an innovative artificial intelligence (AI) application that utilizes Python libraries LangChain and Pandas to transform the management of Standard Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) datasets. This novel approach incorporates natural language interactions to adapt to the distinct requirements of data management, programming, and validation teams in the clinical research field. The data management team gains advantages by writing complex queries utilizing natural language commands, which enables effortless data exploration and extraction of useful insights from SDTM and ADaM datasets. The programming team benefits from utilizing the assistance of programming code, which uses LangChain's natural language understanding to produce Python, SAS, or R code snippets. This enhances the efficiency of the development process while ensuring accuracy and consistency in code generation for data analysis tasks. The validation team benefits from reviewing datasets, identifying, and flagging any anomalies. This approach enhances data quality and assists validation teams in ensuring compliance with regulatory standards. The AI-Enhanced Clinical Data Interaction tool combines LangChain's linguistic capabilities with Pandas' powerful data processing functionalities in the Python environment, providing a comprehensive solution for handling SDTM and ADaM data. This innovative method not only simplifies operations but also enhances smooth team communication, leading to a significant enhancement in the efficiency and effectiveness of clinical data interaction and analysis.

SD-318 : Streamlined EDC data to SDTM Mapping with Medidata RAVE ALS
Yunsheng Wang, ClinChoice
Erik Hansen, Clinichoice
Chao Wang, CliniChoice
Tina Wu, Clinichoice

The manual process of annotating case report forms (CRFs) and mapping raw clinical trial data to the Study Data Tabulation Model (SDTM) standard is both resource-intensive and susceptible to human error. This traditional approach also lacks efficient traceability between CRFs, SDTM mappings files, and final SDTM datasets. This paper explores an innovative method to streamline the mapping process from Electronic Data Capture (EDC) to SDTM by utilizing the Medidata RAVE Architecture Loading Specification (ALS) file. The ALS can be customized to establish links between EDC specifications, CRF, SDTM mapping specifications, and SDTM datasets, creating a seamless end-to-end data flow. By leveraging the customized ALS file, this innovative approach enhances traceability, accuracy, and overall efficiency compared to traditional manual methods. The demonstration of the automated mapping process covers the entire spectrum from EDC data collection to the generation of SDTM datasets. Serving as a potential model for optimizing EDC to SDTM mapping workflows, this structured and automated approach reduces the dependence on manual programming, therefore improving data quality and expediting drug development timelines.

SD-343 : Two hats, one noggin: Perspectives on working as a developer and as a user of the admiral R package for creating ADaMs.
Benjamin Straub, GlaxoSmithKline

R packages form a critical part of the R ecosystem by allowing users to gain additional scoped functions and documentation around common tasks. I have had the privilege of helping to develop the admiral R package, which seeks to allow users of R to build ADaM datasets. At the same time, I have had to use admiral in my studies for building ADaMs. In my presentation and paper, I will share perspectives on being a developer, a user, and the intersection of these two roles. As a developer, I will share what it is like to work on a cross-Pharma initiative developing software in a regulated environment. As a user, I will share what it is like to use admiral and R in my studies and issues that I have encountered. I will conclude with discussing the intersection of being both a user and developer and how it has been a positive feedback loop on developing and enhancing functions and documentation in admiral.

SD-356 : Standardizing Validation Data Sets (VALDS) as matrices indexed by Page, Section, Row, and Columns (PSRC) to improve Validation and output creation and revisions.
Kevin Viel, Navitas Data Sciences

Table and listing (TL) shells are blueprints specifying the dimensions of the output: PAGES, SECTIONS, ROWS, and COLUMNS (PSRC), their orders, and the precision of the values in the cells indexed by those dimensions. Values may be a composite such as Mean (SD). Viel introduced the Validation Data Set (VALDS), a standardization data set derived and used as input to create output (RTF/PDF). Importantly, the VALDS formalizes the structure and format of such data sets, standardizing 1) TL programs of various types and projects, 2) their Validation, and 3) the manual (spot) checking of the output against the VALDS. The programmer (independently) derives a value and inserts or "pigeon holes" it into its respective cell in the VALDS. For instance, a derived value may be assigned to the cell indexed as COLUMN 3 of PAGE 2, SECTION 3, and ROW 1 of the TL. The goal of this paper is to 1) introduce the creation of an analytic data set (ADS) so that at most only one SAS System® FREQ and one MEANS procedure is typically required for the entire program, to index the (composite) value using its PAGE, SECTION, ROW, and COLUMN labels to place it in its appropriate cell of the VALDS matrix, and to demonstrate an efficient way to populate default values, including for dimensions specified by the shell, but absent in the data. Lastly, this paper demonstrates the ease of making major revisions to the output with extremely minimal changes to the TL programs.

SD-365 : Readlog Utility: Python based Log Tool and the First Step of a Comprehensive QC System
Zhihao Luo, Vertex Pharmaceuticals

Log checking is a crucial part of SAS programming quality control. I have developed a python-based keyword lookup log checking and summary tool to provide a major improvement to quality of life when working with log files: For Individual users, who spend a large amount of time on fixing log issues and validating outputs, the tool serves as a graphical user interface with better approach for identification and navigation through issues in the log files. Log messages are categorized into 4 levels based on importance, and users can also create custom log tags within SAS programs for different purposes (i.e. data issue, temporary programming logics, hard coding, cross checks, programming reminders, etc.). For study leads, the tool can batch process log files on target folder or run through all subfolders within a reporting effort area. It will highlight all important log issues for each log files and create a summary report for documentation purpose. I will also briefly talk about a comprehensive QC system that I'm currently developing, which would include log checking, validation result summary, program files and log files timestamp check, missing program or missing PROC COMPARE check, programmer workload breakdown, email notification, all integrated into one centralized tool. It will obtain information directly from the SDTM/ADaM/TFL metadata, Readlog tool and several SAS Macros to provide a dynamically generated programming tracker.

SD-370 : Enhancing FDA Debarment List Compliance through Automated Data Analysis Using Python and SAS
Yongjiang (Jerry) Xu, Genmab
Karen Xu, Northeastern University
Suzanne Viselli, Genmab

This paper presents an innovative approach to ensuring compliance with the FDA's debarment list by employing Python and SAS for automated data analysis. The primary objective was to develop and streamline the process of checking individuals and entities against the FDA's debarment list, thereby reducing the risk of regulatory violations. We utilized Python for data scraping and extracting debarment data from the FDA's website, followed by SAS for data analysis and provided a comprehensive compliance solution. The results demonstrated a significant reduction in manual labor and time required for compliance checks, with the added benefit of minimizing human error. The scalable and efficient tool automatically flags if any individuals or entities are on the debarment list from international trials, facilitating timely and effective decision-making.

SD-397 : Advancing Regulatory Intelligence with conversational and generative AI
Saurabh Das, Tata Consultancy Services
Rohit Kadam, Mr.
Rajasekhar Gadde, Mr.
Niketan Panchal, Mr.
Saroj Sah, Mr.

Staying up to date with regulatory requirements, especially in the field of life sciences, is extremely important. Regulatory authorities need to update regulations because of new guidelines, rapid innovation, changing global health scenarios, etc. to ensure enhanced safety of patients. Life sciences organizations need to adhere to these changing regulations and take data-driven decisions. Traditionally, this involved a regulatory professional to frequently and manually search through multiple Regulatory sites (, etc.) and monitor the changes in guidelines, updates etc. This process is extremely time-consuming and effort intensive. In order, to overcome this challenge, a chatbot for regulatory services promises an effective solution. This chatbot can simulate human-like conversations to allow for a user-friendly and easy-to-understand interface. At the same time, the technology behind the chatbot would make the process of searching through regulatory information and staying abreast more efficient and convenient. This chatbot aims to utilize the Large language models (LLMs) coupled with next-generation Graph database, search and analytic engines etc. to create a comprehensive AI-based technology automation platform. The chatbot envisages to change user interaction from an exhaustive, ad-hoc, multi-step manual activity to a conversational integrated process. As a result, users would be able to converse with the chatbot and receive up-to-date information about any food, drug, or medical device from regulatory bodies.

SD-401 : Excel Email Automation Tool: Streamlining Email Creation and Scheduling
Xinran Hu, Merck
Jeff Xia, Merck

In the evolving landscape of digital communication, efficiency in email management has become paramount for businesses and organizations. This paper introduces the Excel Email Automation Tool, a solution designed to automate the drafting and scheduling of emails using Microsoft Excel as a front-end interface, seamlessly integrated with Microsoft Outlook. In the intricate landscape of pharmaceutical research and development, accurate and timely communication is paramount. This industry, laden with vast datasets, regulatory requirements, and critical timelines, also grapples with duplicated communication among technical leads, trial managers, and vendor teams. The Excel Email Automation Tool emerges as an essential solution to these multifaceted challenges. By bridging the gap between complex data management in Microsoft Excel and effective communication via Microsoft Outlook, this tool introduces a streamlined approach to correspondence. It automates the drafting and scheduling of emails, ensuring that pivotal information is conveyed accurately, consistently, and promptly. By reducing manual interventions and the potential for communication overlaps, the tool upholds the integrity and credibility of communications.

SD-412 : Safety Signals from Patient Narratives PLUS: Augmenting Artificial Intelligence to Enhance Generative AI Value
Sherrine Eid, SAS Institute
Sundaresh Sankaran, SAS Institute

Natural Language Processing (NLP) is integral to most Artificial Intelligence solutions that classify and extract events from patient narratives. Challenges arise from their unstructured, labor-intensive and complex nature. Users need to contextualize, reconcile and validate results to establish accurate insights. Large Language Models (LLMs) is a component of Generative AI and presents immense possibilities to ease and enhance safety signal analysis. However, they are imperfect due to lack of context, limited representative data and inadequate data preparation. In this paper, we demonstrate NLP as a means to enhance the value of Generative AI. First, we present a process workflow making use of SAS Viya, open-source LLMs and vector databases. Next, we demonstrate how NLP helps establish accurate and governable embeddings, a key component in data hydration and inferencing. Entity definition & extraction techniques reduces noise and enhance relevance of queries. Finally, we illustrate how a decision flow integrates functionality across the Analytical Life Cycle. We conclude that an integrated approach to safety signal analysis can increase productivity and provide high-quality results and insights. We also highlight considerations necessary for seamless integration of a wide range of analytical tools and their complementary benefits.

SD-426 : Shift gears with 'gt': Finely tuned clinical reporting in R using "gt" and "gt summary" packages
Rajprakash Chennamaneni, Jazz Pharmaceuticals
Sudhir Kedare, Jazz Pharmaceuticals
Jagan Mohan Achi, Jazz Pharmaceuticals

Efforts are underway to use open-source technologies like R and Python for FDA regulatory submission. Based on ongoing initiatives with FDA, there is a strong likelihood that FDA will embrace regulatory submissions utilizing open-source technologies along with SAS. There are numerus existing and emerging packages in R for presenting data in tabular format. This paper will focus on "gt" and its extension "gt summary" package. "gt" package framework uses table header, stub, column labels, spanner column labels, table body and table footer components to create summary reports. We will use these components to demonstrate the ease and flexibility of developing clinical reports. Furthermore, we will leverage "gt summary package" to generate descriptive statistic tables, efficacy outputs, inline tables, and a few custom tables.

SD-429 : Build up Your Own ChatGPT Environment with Azure OpenAI Platform
Bill Zhang, ClinChoice Inc.
Jun Yang, ClinChoice Inc.

ChatGPT from OpenAI is a state-of-the-art chatbot technology that uses natural language processing to understand and respond to user queries. With ChatGPT, you can do your daily work more efficiently and effectively. This article provides a step-by-step guide on how to configure and extend a customized, secure ChatGPT environment with Azure OpenAI platform for your organization. The article starts with an introduction to the topic and explains why it is important to configure and deploy a customized ChatGPT environment for pharmaceutical industry. It then provides some architecture information on internal ChatGPT environment with Azure OpenAI platform & Azure AI search services, explaining what they are, how they work together. It then describes the methodology used to configure and deploy it, explaining the steps involved in the process and providing details on how to implement each step, including OpenAI model deployment, Azure AI search configuration, internal business document ingestion as ChatGPT extension, and web server integration with OpenAI API etc. The article also gives simple user guides and skills while utilizing AI features like general chat and completion, SAS/R/Python programming copilot and internal document query. At last, it also provides some recommendations for future research and development.

SD-431 : inspectoR: QC in R? No Problem!
Steve Wade, Jazz Pharmaceuticals
Sudhir Kedare, Jazz Pharmaceuticals
Matt Travell, Jazz Pharmaceuticals
Chen Yang, Chen Yang
Jagan Mohan Achi, Jazz Pharmaceuticals

More organizations are starting to embrace open-source technologies to perform tasks traditionally completed in SAS®. One such activity is to QC datasets, tables, and figures in the process of producing TLF's. Independent programming is done for many of those TLF's, comparing the results from both programmers. Jazz has developed the inspectoR package, an alternative to the SAS® COMPARE procedure, to allow QC performed in R to be compared back to datasets generated using SAS®. In this paper, we demonstrate how inspectoR will compare these datasets and produce a report showing the findings. The report produced by inspectoR is much like PROC COMPARE output but is produced using HTML in a more readable format. inspectoR has proven to be a valuable tool in helping to transition QC tasks to R and maintain the level of quality expected from SAS® systems.

SD-444 : Five Reasons To Swipe Right on PROC FCMP, the SAS Function Compiler for Building Modular, Maintainable, Readable, Reusable, Flexible, Configurable User-Defined Functions and Subroutines
Troy Martin Hughes, Datmesis Analytics

PROC FCMP (aka, the SAS® function compiler) empowers SAS practitioners to build our own user-defined functions and subroutines- "callable software modules that containerize discrete functionality, and which effectively extend the Base SAS programming language. This presentation, taught by the author of the 2024 SAS Press PROC FCMP book, explores five high-level problem sets that user-defined functions can solve. Learn how to hide a hash object (and its complexity) inside a function, how to design a format (or informat) that calls a function, and even how to run a DATA step (or procedure) inside a DATA step using RUN_MACRO! Interwoven throughout the discussion are the specific software quality characteristics- "such as modularity, flexibility, configurability, reusability, and maintainability- "that can be improved through the design and implementation of PROC FCMP user-defined functions!

Statistics and Analytics

ST-113 : Multiple Logistic Regression Analysis using Backward Selection Process on Objective Response Data with SAS®
Girish Kankipati, Seagen Inc
Jai Deep Mittapalli, Seagen Inc.

Logistic regression is a process of modeling the probability of a discrete outcome given an input variable. The most common logistic regression models a binary outcome such as true/false, or yes/no, whereas multinomial logistic regression is used when there are more than two possible discrete outcomes. Logistic regression is a useful analysis method for classification problems, when you are trying to determine if a new sample fits best into a subgroup. In oncology clinical trials, analyzing objective response data using this type of analysis will help select key covariates and stratification factors particularly when an investigational product is transitioning from first-in-human to late-phase trials. Such factors are selected by determining probability and confidence interval values using regression models. This paper will discuss insights into multiple logistic regression procedures and shows an example of how to select the covariates in an oncology study. The paper also explains how to input objective response data into a logistic procedure to select covariates such as age, weight, or lines of therapy. The outcome of this analysis includes 95% CI and p-values for the selected covariates. The paper also explains the difference between the outcome of logistic regression with and without backward selection with the help of a sample mock. This methodology will help statisticians and programmers understand the data and build more robust statistical models.

ST-164 : Data Literacy 101: Understanding Data and the Extraction of Insights
Kirk Paul Lafler, sasNerd

Data is ubiquitous and growing at extraordinary rates, so a solid foundation of data essentials is needed. Topics include the fundamentals of data literacy, how to derive insights from data, and how data can help with decision-making tasks. Attendees learn about the types of data - nominal, ordinal, interval, and ratio; how to assess the quality of data; explore data using visualization techniques; and use selected statistical methods to describe data.

ST-192 : Generative Artificial Intelligence in sample size estimation - challenges, pitfalls, and conclusions
Igor Goldfarb, Accenture
Sharma Vikas, Accenture

Last couple of years are characterized by storming expansion of Generative Artificial Intelligence (GenAI) and leading pharma companies plan to invest billions of dollars in development of GenAI in upcoming years. The goal of this work is to increase awareness of investigators, scientists and statisticians about challenges and risks in application of GenAI to calculate the sample size for prospective clinical trial. The authors assessed the performance, benefits and risks of using GenAI for sample size calculations in clinical study design. One of the popular GenAI products - Chat Generative Pretrained Transformer (ChatGPT) - " was chosen as a working tool. The sample size given in the completed study (the data were taken from was replicated using commercial software nQuery and ChatGPT. The authors found that ChatGPT failed to produce right outcome (nQuery returned correct results) while demonstrating a spectrum of errors. ChatGPT did not also meet reproducibility test - " the tool returned various results in response to the same request asked multiple times. The authors discuss the potential reasons of these observations and assume that the fundamental properties of the Large Language Models (LLM) lying in the core of GenAI may be responsible for these outcomes. While the use of GenAI is very promising, there are many challenges and limitations in its application at its current stage. (e.g., bias, hallucinations, complexity, replicability, etc.). In general, scientists and statistician should exercise caution in using GenAI. Companies planning the clinical study are recommended to hire experienced biostatisticians for sample size estimation.

ST-199 : Demystifying Incidence Rates: A Step-by-Step Guide to Adverse Event Analysis for Novice Programmers
Yuting Peng, Amgen, Inc
Ruohan Wang, Amgen, Inc

This paper delves into the intricacies of Adverse Event (AE) reporting within the realm of clinical trials, focusing on three distinct methods of treatment-emergent AE incidence rate computation: Incidence Rate (IR), Event Incidence Rate Adjusted by Patient-Years (EIR), and Exposure-Adjusted Incidence Rate (EAIR). While Incidence Rate (IR) and Event Incidence Rate (EIR) have traditionally found favor in clinical trial assessments, this study is prompted by a recent information request from the FDA during a submission of a supplement Biologics License Application (sBLA), advocating the adoption of Exposure-Adjusted Incidence Rate (EAIR) for summarizing AE data. This paper aims to provide a comprehensive understanding of each method by delineating their definitions, comparing their nuances, and contrasting their applications. Furthermore, the article extends its contribution by elucidating the structuring of Analysis Data Model (ADaM) datasets and offering corresponding SAS code implementations, fostering practical insights for researchers navigating AE reporting complexities. Through this exploration, the paper seeks to enhance comprehension, guide methodological choices, and contribute to the evolving landscape of clinical trial data analysis and regulatory compliance.

ST-208 : Bayesian Methods in Survival Analysis: Enhancing Insights in Clinical Research
Vadym Kalinichenko, Intego Group LLC

Survival analysis has long been a cornerstone in understanding time-to-event data. However, traditional survival analysis often relies on frequentist statistics, which can be limited in handling complex scenarios and incorporating prior knowledge. Hence, it is a prudent decision to integrate Bayesian methods, a transformative approach that offers a fresh perspective to survival analysis. This paper peers into the application of Bayesian methods in survival analysis, presenting a nuanced perspective that enhances insights in clinical research. Bayesian approaches provide a robust framework for handling complex survival data, incorporating prior knowledge, and facilitating more informed decision-making. This paper demonstrates how Bayesian approaches increase the understanding of survival outcomes, contributing significantly to the advancement of clinical research. Attendees will gain insights into the advantages of Bayesian techniques, contributing to a more comprehensive and detailed interpretation of survival data, and ultimately enhancing the quality of conclusions drawn from clinical research studies.

ST-234 : A unique and innovative end-to-end demand planning and forecasting process using a collection of SAS products
Stephen Sloan, Dawson D R

Forecasting demand can be a very tricky process. Questions arise about which statistical algorithms to use when forecasting based on past sales, how to incorporate business knowledge into the forecast, planning for unforeseen events, and planning for unique events that would not be predictable from sales history. As an example, COVID caused many forecasts to be wrong about quantities when purchasing switched from services to goods, from in-store sales to remote purchases, and from work in the office to remote work. Even when total quantities were forecasted incorrectly, the percentage distribution of the sales for some of the subcategories within the larger categories was often accurate. The solution, then, is to leverage the useful information from the statistical forecasts while allowing the people who know the business to make individual or mass updates. All of this can be accomplished using existing SAS products: SAS EG and SAS DI to read in, manipulate, and output data; HPF, which underpins SAS Enterprise Miner, to create statistical forecasts; and SAS FM, which incorporates Excel while remaining within SAS, to allow users to make individual and mass changes to SAS data sets.

ST-251 : Dealing with Missing Data: Practical Implementation in SAS and R
Isabella Wang, Eli Lilly & Co.
Jin Xie, Eli Lilly & Co.
Lauren George, Eli Lilly & Co.

Missing data is a prevalent challenge encountered in Phase III clinical trials. The appropriate handling of missing data becomes crucial and depends on the specific estimand under consideration. In this paper, we will delve into realistic examples of 5 popular missing data imputation methods and accompany with relevant R and SAS code. This paper aims to equip clinical trial professionals with practical tools to address missing data effectively and bridge the gap between theoretical understanding and actual application in handling missing data.

ST-297 : Relative Dose Intensity in Oncology Trials: A Discussion of Two Approaches
Christiana Hawn, Catalyst Flex
Dhruv Bansal, Catalyst Flex

This paper will explore the unique challenges of calculating relative dose intensity in early-phase oncology studies. Relative dose intensity, defined as the actual dose received divided by the prescribed dose according to the protocol, is an important endpoint in clinical trials. This is a straightforward calculation for drugs with daily pill administration, however it becomes more complicated when dealing with intravenous (IV) drugs on varying dosing schedules and cycle lengths as is common in early-phase oncology studies. This paper will present a mock scenario representing a typical early phase oncology study with multiple schedules and explore two different methods of calculating relative dose intensity. The first approach counts the number of expected doses based on the number of cycles completed compared to the number of doses received. This is a straightforward calculation and easy to interpret but does not account for time factors such as dosing delays and skipped cycles. The second approach incorporates the expected number of doses based on the duration of treatment compared to the actual number of doses received in that amount of time. We will demonstrate that the resulting dose intensity values between the two approaches can vary significantly. We will include examples for ADaM specifications and programming in SAS and discuss the reasons to consider each approach.

ST-303 : Source Data Quality Issues in PopPK/PD Dataset Programming: a Systematic Approach to Handle Duplicates
Ibrahim Priyana Hardjawidjaksana, SGS Health Science
Els Janssens, SGS Health Science
Ellen Winckelmans, SGS Health Science

In a perfect world, source data used for PopPK/PD dataset programming would be complete and seamless, but practice teaches us that this is rarely the case. Two of the most prevalent reasons affecting data quality are data duplications and the absence of pivotal data. These issues can present a challenge for programmers when preparing the datasets as it causes merging errors, failure of standard programming logic, and could increase the complexity of the program, resulting in more time and effort spent on programming. The standard approach for PopPK/PD datasets is commenting out records containing data quality issues and record its reason in a specific variable. However, the decision of omitting data from the dataset and how to handle the data quality during programming requires careful consideration since it can have a significant impact on the accuracy and integrity of the analysis. It's therefore crucial for both the PK analyst and programmer to be aware of the risks associated with the decisions taken. Good coordination and understanding between both parties are necessary and reaching a mutual agreement is imperative. We will share our view on how datasets programming prior to database unblinding can be advantageous for early issue detection, and how standardization of assumptions to tackle a certain data quality issue will enhance completeness of the popPK/PK dataset, and accelerate the analysis process.

ST-334 : Versatile and efficient graphical multiple comparison procedures with {graphicalMCP}
Ethan Brockmann, Atorus Research
Dong Xi, Gilead Sciences, Inc.

Multiple comparison procedures (MCPs) are widely used in confirmatory clinical trials to control the probability of making false positive claims. Graphical approaches provide a flexible framework to accommodate commonly-used MCPs and to create more powerful procedures. To allow both flexibility and efficiency, we've created graphicalMCP, an R-native package that is lightweight, performant, and educational. Using an extensible approach to allow different test types in different parts of a graph, the package covers all existing R packages that use graphical approaches. Using vectorization and shortcuts, it greatly reduces simulation times compared to other similar packages, without using more efficient languages such as C++, which allows better compatibility with different computing environments. It also has the capability to draw graphical MCPs using nodes and directed edges, giving users great visualization along with full quantitative details.

ST-338 : Bayesian Additive Regression Trees for Counterfactual Prediction and Estimation
Michael Lamm, SAS

Bayesian additive regression trees (BART) are consistently cited as one of the best-performing classes of models for regression-based approaches to estimating counterfactual outcomes. The properties of BART models that make them well suited for this task are their flexibility, the good generalization of the default model, and the ability to take a Bayesian approach to uncertainty quantification. These properties allow BART models to mitigate many of the common concerns that arise from using a regression-based approach to counterfactual estimation. Namely, the flexibility of BART models enables you to capture potentially complex interactions and treatment effect heterogeneity, while the typically good generalization of the default BART models can limit overfitting to the observed data and preserve the quality of counterfactual predictions. This paper provides a brief introduction to BART models, discusses their implementation in SAS® Visual Statistics software using the BART procedure, and demonstrates how you can use them to estimate counterfactual outcomes and causal effects defined in a counterfactual framework.

ST-339 : Bayesian Hierarchical Models with the Power Prior Using PROC BGLIMM
Fang Chen, SAS
Yi Gong, SAS

The power prior provides a dynamic approach to translate data information into distributional information about the model parameters. Since its introduction, the power prior has played an increasingly prominent role in many disciplines covering a large number of disciplines. As popular as the prior has become, a software problem persists: the construction of the power prior based on data sets is often difficult to implement in general Bayesian software such as PROC MCMC, and relies on clever programming solutions that are problem-specific and hard to generalize. In this paper, we illustrate new features in SAS' BGLIMM procedure that enable you to fit the power prior to many models (hierarchical generalized linear models, repeated measurement models, missing data problems, etc.) with the simplest setup. We also discuss model selection, choice of the power parameter a0 in single and multiple historical data sets setting, and interpretation of the specified a0 value.

ST-366 : MLNR or Machine Learning in R
Chuck Kincaid, Experis Business Analytics

Are you eager to dive into the fascinating world of machine learning and leverage the power of R to unlock its true potential? This presentation will equip you with key machine learning techniques and navigate the tidymodel suite of packages to tackle real-world challenges. We will use examples to build machine learning models, exploring the when, the why and the how. The presentation will be accessible to those with all levels of machine learning expertise, though most helpful for those new to the area. Familiarity with base R and the tidyverse would be beneficial.

ST-381 : Opportunities and Challenges for R as an open-sourced solution for statistical analysis and reporting, from vendor's perspective
Peng Zhang, CIMS Global
Lizhong Liu, CIMS global
Tai Xie, CIMS Global

In recent years, there is an increasing trend from using SAS to using open-sourced R for statistical analysis and reporting. Several large pharma companies have conducted mock submissions to FDA. This improves the progress of further open-source applications in clinical trial reporting. While most of experience that we can borrow are from large pharmaceutical companies, there are rare literatures and materials about how to start R practice from small biotechnology companies, CRO, or data service companies, e.g. what can be done through R, Team building, training procedure, R development focus, and application of R in current clinical trial industries. In this paper, we will review current practices of R and share our approach about standardization, automation, and further consideration in pharmaceutical industries.

ST-414 : Estimating Time to Steady State Analysis in SAS
Richard Moreton, Merck & Co., Inc., Rahway, NJ, USA
Lata Maganti, Merck

For chronic use drugs, the time it takes to reach steady state drug concentrations needs to be estimated. Steady state is said to be reached when after multiple doses of a drug there is no meaningful difference between pharmacokinetic (PK) profiles of successive doses. An overview of three statistical methodologies is presented along with SAS code. These methodologies include NOSTASOT (No-Statistical-Significance-of-Trend), effective half-life for accumulation, and nonlinear mixed effects modeling. NOSTASOT requires the least number of assumptions while effective half-life and nonlinear modeling have several assumptions. In order to perform the analyses, we will touch upon advanced programming with PROC IML, PROC MIXED, and PROC NLMIXED.

ST-425 : iCSR: A Wormhole to Interactive Data Exploration Universe
Sudhir Kedare, Jazz Pharmaceuticals
Steve Wade, Jazz Pharmaceuticals
Chen Yang, Chen Yang
Matthew Travell, Jazz Pharmaceuticals
Jagan Mohan Achi, Jazz Pharmaceuticals

In the past decade, CDISC has brought standardization to clinical data, significantly benefiting clinical development by allowing regulatory reviewers to review submitted data with efficiency and precision. Despite these advances, bottlenecks such as static TFL report submission hinder the overall efficiency of the review cycle. At Jazz, we are continuously seeking innovative ways to explore data to establish a foundation for digital transformation. Namely, we have developed the Interactive Clinical Study Reports (iCSR) application, utilizing CDISC-standardized data and RShiny capabilities to enable interactive data exploration. This paper demonstrates how iCSR facilitates real-time dynamic data exploration and significantly improves the data review process by providing fast insights and enabling informed decision-making. iCSR comes with an intuitive UI and minimal onboarding. For pharmaceutical organizations, iCSR is a promising solution to optimize data review and lead efforts to get effective therapies in patients' hands as early as possible.

Strategic Implementation & Innovation

SI-136 : Agile, Collaborative, Efficient (ACE): A New Perspective on Data Monitoring Committee Data Review Preparation
Ke Xiao, Boehringer Ingelheim

Data Monitoring Committee (DMC) has been part of clinical trials since 1960's (FDA, 2006). Nowadays, an increasing number of sponsors utilize DMC in various situations to monitor safety data, critical efficacy data, and ensure the integrity of study conduct. Given the different purposes of DMCs, there are diverse approaches to facilitate DMC data review. In this paper, we will discuss certain strategies and processes that Boehringer Ingelheim (BI) implement for DMC. To begin with, we will briefly describe the general purpose of a DMC as well as its roles and responsibilities in BI, followed by an introduction to BI's independent safety/statistical analysis team (iSAT). The focus is to present three models that iSAT, as an independent team within sponsor, apply to efficiently and timely assist DMC data review. We will conclude by sharing experience/lessons learned from supporting DMCs in randomized non-pivotal and/or open label registrational studies to open further discussion.

SI-140 : A fear of missing out and a fear of messing up : A Strategic Roadmap for ChatGPT Integration at Company Level
Kevin Lee, Karuna Therapeutics

Does your organization allow ChatGPT at work? The answer might depend on where you work. Many organizations do not allow ChatGPT at work. The truth is that for the organizations, ChatGPT is a fear of missing out and a fear of messing up. But, just like any other past new technologies such as Cloud computing and social media, the organizations eventually integrate ChatGPT or other Large Language Model (LLM). This paper is for those especially Biometrics who want to initiate ChatGPT integration at work. This paper presents how Biometric department can lead the integration of LLM, focusing on the exemplary model ChatGPT, across an entire enterprise, even in situations where the organization restricts or prohibits ChatGPT usage at work. The roadmap outlines key stages, starting with an introduction to LLM and ChatGPT, followed by potential risks and concerns and the benefits and diverse use cases. The roadmap will emphasize how Biometrics function leads the building of a cross-functional team to initiate ChatGPT integration and build the policy and guidelines. Then, the roadmap discusses the crucial aspect of training, emphasizing user education and engagement based on company polices. The roadmap finishes with a Proof of Concept (PoC) to validate and evaluate the ChatGPT's applicability to organizational needs and its compliance to company policies. This paper can serve as a valuable resource navigating the implementation journey of ChatGPT, providing insights and strategies for successful integration, even within the confines of organizational limitations on ChatGPT usage.

SI-160 : LLM-Enhanced Training Agent for Statistical Programming
Jason Zhang, Merck
Jaime Yan, Merck

This paper presents a cutting-edge training agent for statistical programming, leveraging a Language Learning Model (LLM) to streamline corporate onboarding and skill development. Central to this system are OpenAI's LLM functionalities, tailored prompts, and a comprehensive support document system, integrating GPTs, FDA CDISC guidelines, and company-specific materials. The agent excels in creating personalized learning paths and plans, addressing the common challenge of systematic learning in corporate environments. It effectively curates relevant materials and conducts assessments, fostering a self-learning culture complemented by interactive workshops. This method not only enhances training efficiency and cost-effectiveness but also ensures employees are equipped with necessary statistical programming and regulatory compliance skills, promoting continuous professional growth. Overall, this LLM-powered training agent signifies a transformative shift in corporate training, offering a scalable and personalized approach to learning and development.

SI-185 : The Role of the Blinded Programmer in Preparation of Data Monitoring Committee Packages (for Clinical Trials)
Binal Mehta, Merck & Co.
Patel Mukesh, Merck & Co INC

A data monitoring committee (DMC) is a group representing clinicians and biostatisticians having expertise in clinical trials and are appointed by study sponsors to provide an independent assessment of the safety, scientific validity, and integrity of clinical trials. In the United States, the FDA requires the sponsors to establish a DMC in all clinical trials that assess new treatment interventions. The FDA also strongly recommends establishing a DMC in clinical studies which have double blinded treatment assignment, have substantial safety concern, or have a significant impact on clinical practice. Blinding in clinical trials is a procedure in which one or more personnel are unaware of which treatment participants are receiving, a technique that maintains the ethics and integrity in a study and reduces the bias in design or execution of the trial and its results. A Blinded programmer will use the blinded data and dummy treatment to produce analysis level datasets and clinical reports in support of DMC package. Another unblinded programmer will use these analysis programs on unblinded data and actual treatment to produce analysis level datasets and clinical reports for DMC review. The Blinded programmer plays a key role in developing and maintaining of all analysis programs and maintains communication with unblinded programmer. This paper will walk through the role of blinded programmer in preparation of DMC package and critical communication with unblinded programmer when unblinding happens to resolve any challenges and issues.

SI-189 : Automating the annotation of TLF mocks Using Generative AI
Vidya Gopal, Astrazeneca

In this paper we will be discussing the use of Generative AI to automate the mapping and annotation of ADaM variables to TLF mocks to increase the efficiency, quality, and accuracy of TLFs. The TLF mock provides a specification for how the TLF should be created from the ADaM dataset. In a study, annotating the mock is crucial to provide quality and traceability. Annotating a mock is time and labor intensive due to the volume of TLFs and ADaM variables. This proof-of-concept proposal would entail exploring the use of Generative AI prompts to match the variable name given in the ADaM specification document to the general name given in the TLF mock shell. For instance, if the ADSL specification includes a variable named AGECAT, this idea would aim to map this variable with the "Age Group" value in a demographic table template. This can be done through techniques such as similarity analysis of ADaM dataset metadata. Further work can examine approaches using prompt engineering to explore different types of analysis. Once this process is completed, a review can be performed in order to validate the results.

SI-190 : Navigating Success: Exploring AI-Assisted Approaches in Predicting and Evaluating Outcome of Clinical Trials and Submissions
Ruohan Wang, Amgen, Inc
Chris Qin, AMGEN, INC

Moving beyond the prevalent application of AI in drug discovery in pharmaceutical industry, this paper aims to bring a discussion on the broader benefits of AI in predicting and evaluating outcomes of clinical trial and regulatory submissions. With the goal of aiding sponsors to mitigate risk factors for higher success rates and foster cost-effective strategies, it is to trigger a comprehensive brainstorm on 1) Identifying potential issues that might trigger regulatory concerns via AI modeling of historical data; 2) Evaluating clinical trials through multidimensional analysis to enable proactive early interventions that heighten probability of favorable outcomes and boost trial success rate. The discussion extends to the strategic organizational setup and workflow to implement wide-ranging AI techniques and innovative approaches to meet the goal, underlining the cross-functional collaboration and adaptive strategies. This paper is poised to contribute to the growing discourse on how AI assisted approaches can empower informed decision-making, optimize resource allocation, and increase profitability for sponsors with promising drug candidates, and ultimately benefit patients through receiving approved treatments earlier.

SI-230 : Quality Assurance within Statistical Programming: A Systemic Way to Improve Quality Control
Todd Case, Vertex Pharmaceuticals
Margaret Huang, Vertex Pharmaceuticals, Inc.

This paper is a discussion of how and why Quality Assurance (QA) can be used to improve the Quality Control (QC) process within Statistical Programming. QA is defined as a process that serves as a third layer of checks to generate the selected Tables, Figures, and Listings (TFLs) from the raw data. It starts after QC programmer completes the QC process. The QA process is designed to ensure the highest quality, confirming that the key efficacy and the key safety reports accurately reflect the underlying data and statistical analysis. The aim of QA activities is to identify and correct any errors or inconsistencies that could potentially impact the interpretation of clinical trial results. One of the major benefits of this process is that it bypasses any issues that may have resulted in errors when matching standard SDTM/ADaM data for any number of reasons. The QA programmer also checks the accuracy of the study documents and ensures the consistency across the studies. Specific process and key documentation/guidance are identified which have resulted in using selective QA (almost always on a pivotal phase III study or other deliverables intended for regulatory review). The content of this paper is based on many years of experience and past formal training of the authors and of both current and prior project team members.

SI-269 : Validating R for Pharma - Streamlining the Validation of Open-Source R Packages within Highly Regulated Pharmaceutical Work
Juliane Manitz, EMD Serono
Anuja Das, Biogen
Antal Martinecz, Certara
Jaxon Abercrombie, Genentech
Doug Kelkhoff, Roche

R Validation Hub is a cross-industry collaboration to support the adoption of R within a biopharmaceutical regulatory setting through appropriate tools and resources that leverage the open source, collaborative nature of the language. Using R in submissions to healthcare regulators often requires documentation showing that the quality of the programming packages used was adequately assessed. This can pose a challenge in R where many of the commonly used tools are open source. Through this paper, we will highlight the R Validation Hub's risk assessment framework for R packages that has been utilized by key pharma companies across the industry. We also showcase the products our working groups have developed including the {riskmetric} R package that evaluates the risk of an R package using a specified set of metrics and validation criteria, and the {riskassessment} app that augments the utility of the {riskmetric} package within a Shiny app front end. Lastly, we will illustrate a prototype of a technical framework to maintain a 'repository' of R packages with accompanying evidence of their quality and the assessment criteria. All of our work is designed to facilitate the use of R within a highly regulated space and ease the burden of using R packages within a validated environment.

SI-291 : PHUSE Safety Analytics Working Group - Overview and Deliverables Update
Nancy Brucken, IQVIA
Mary Nilsson, Eli Lilly & Company
Greg Ball, ASAP Process Consulting

The PHUSE Safety Analytics Working Group, a cross-disciplinary cross-industry collaboration, is working to improve the content and implementation of clinical trial safety assessments for medical research, leading to better data interpretations and increased efficiency in clinical drug development and review processes. The Working Group has produced numerous deliverables (Conference Posters and Presentations, White Papers, Publications, Blogs, etc.,) over the past 10 years and has many ongoing projects. This presentation will provide an overview of the Working Group and its associated project teams, and share an update of the teams' progress, key deliverables for awareness and a summary of ongoing projects.

SI-319 : A Change is Gonna Come: Maintaining Company Culture, Managing Time Zones, and Integrating Teams after a Global Acquisition
Lydia King, Catalyst Clinical Research

As a small CRO that prides itself on having a positive company culture with emphasis on work-life balance, our US-based Biometrics team was quite good at - running the engine' with our portfolio of clinical trial clients. And although expansion had been occurring for several years as part of the larger company strategy, there had been minimal impact to our department, until 2023. We acquired a Biometrics company from India, and suddenly were immersed in integration activities. So how did we keep that company culture and not fall victim to our fear of change? While it is still very much a work-in progress, this paper will outline the important strategies put into place to ensure a successful transition. We will detail the thoughtful approach of a multi-functional leadership team that focused on the Day 1 message of - Do No Harm', assurances of job security, a plan for analysis of gaps and current work-state, introductions of our teams, and implementation and acknowledgement of quick wins. We will discuss how a future state work plan includes giving a voice to your teams, respecting different time zones, and sharing responsibilities. The intent of this paper is to demonstrate how to embrace change rather than fear it, whether it's company acquisitions or departmental changes. Take change as a positive experience that could improve career opportunities. Through times when you think you can't last long, you should know you can carry on.

SI-346 : aCRF Copilot: Pioneering AI/ML Assisted CRF Annotation for Enhanced Clinical Data Management Efficiency
Chaitanya Pradeep Repaka, AI Lens Tech Pvt Ltd
Santhosh Karra, TechData Service Company LLC

In clinical programming, the aCRF Copilot, leveraging Artificial Intelligence (AI), Natural language processing (NLP) and Machine Learning (ML), streamlines Case Report Form (CRF) annotation. It suggests and adds SDTM annotations, bookmarks to the aCRF, following CDISC guidelines. This reduces the time and manual effort traditionally needed in CRF annotation by intelligently matching aCRF text with current SDTM standards. The tool supports various domains like Adverse Events,Vital Signs etc., It has shown a ~70% accuracy (~92% Precision, ~62% Recall - ' ML metrics) in preliminary tests, expected to improve with user feedback - Reinforcement learning. The aCRF Copilot has easy-to-use interface complete with a built-in mentor chatbot built with fine-tuned State of the art LLMs-Large language models (Mistral 7b, Zephyr 7b) designed to aid both beginner and expert users with different levels of SDTM knowledge ensuring data security and compliance with data protection regulations. Future enhancements will include protocol and metadata analysis for improved accuracy and complete SDTM automation, making it a cost-effective clinical trial solution. This paper highlights aCRF Copilot as a pioneering AI/ML tool in CRF annotation, discussing its capabilities, user feedback, and future potential in clinical data management. Also, it will provide an understanding of how AI and ML can be effectively utilized as supportive tools in clinical data management tasks.

SI-362 : SASBuddy: Enhancing SAS Programming with Large Language Model Integration
Karma Tarap, BMS
Nicole Thorne, BMS
Tamara Martin, BMS
Derek Morgan, Bristol Myers Squibb
Pooja Ghangare, Ephicacy

SASBuddy, "your friendly SAS programming assistant," is a pioneering SAS macro designed to revolutionize SAS programming through Large Language Models (LLMs). This tool empowers users, particularly those with limited SAS coding expertise, to generate precise SAS code efficiently by interpreting natural language inputs. The core of SASBuddy's functionality lies in its ability to provide contextually accurate SAS code, tailored to specific datasets based on user queries. A critical feature of SASBuddy is its iterative learning process, which allows for ongoing refinement of its code generation capabilities. Each user interaction contributes to this process, enhancing the accuracy and relevance of the generated code. This continuous improvement mechanism ensures that SASBuddy remains adaptive and efficient, meeting the changing demands of SAS programming. SASBuddy represents a significant stride in making SAS programming more accessible and user-friendly. By integrating natural language processing within a programming context, SASBuddy not only simplifies the code generation process but also illustrates the growing convergence of linguistic technology and programming, heralding a new chapter in the development of intuitive programming tools.

SI-391 : Facilitating Seamless SAS-to-R Transition in Clinical Data Analysis: A Finetuned LLM Approach
Chaitanya Pradeep Repaka, AI Lens Tech Pvt Ltd
Santhosh Karra, TechData Service Company LLC

The rapid evolution of data analysis tools has presented the clinical industry with both opportunities and challenges, particularly in transitioning from traditional SAS-based environments to the more versatile R programming language. This paper introduces a groundbreaking Language Model (LLM) specifically finetuned to bridge this gap, enhancing the ease of transition for clinical data analysts. Our model is uniquely tailored to understand and convert SAS code into R, focusing on key areas such as Study Data Tabulation Models (SDTMs), Analysis Data Models (ADaMs), and Tables, Figures, and Listings (TFLs). This LLM serves as an intelligent personal assistant, offering in-depth insights into the comparative dynamics of SAS and R coding methodologies, thereby facilitating a deeper understanding and efficient reproducibility of clinical data analysis processes. Unlike general-purpose models such as ChatGPT, this LLM is designed for local deployment, addressing crucial data privacy concerns inherent in clinical research. Our presentation will detail the model's development, emphasizing its capabilities in handling subject-level and adverse event data. Preliminary results demonstrate a high accuracy in the model's responses, indicating its potential as a transformative tool in the clinical data analysis landscape. The implications of these findings for enhancing data analysis workflows in the clinical sector will be explored, highlighting the model's contribution to both efficiency and data security in clinical research

SI-408 : Agile Sponsor Oversight of Statistical Programming Activities
Manuela Koska, Koska GmbH
Veronika Csom, Koska GmbH

Agile sponsor oversight of a Contract Research Organization (CRO) for statistical programming involves a dynamic, collaborative, and flexible approach to monitoring and managing the CRO's work. Key characteristics of this method include constant communication, iterative processes, adaptive planning, proactive risk management, stakeholder engagement, and an unwavering focus on the quality of deliverables. In our discourse, we will explain in detail how oversight might be planned, conducted, and documented and highlight how the principles of agility can significantly contribute to its successful execution. A differentiation will be made between the concepts of sponsor oversight and vendor qualification, and it will be described how agile oversight aligns with risk-based principles, providing the flexibility to modify plans as necessary. Additionally, the paper will shed light on the dark sides of oversight and cover the essential qualities and skill sets required in personnel responsible for overseeing statistical programming activities.

SI-446 : One size does not fit all: The need and art of customizing SCE and MDR for end users
Shilpa Sood, Ephicacy Consulting Group
Sridhar Vijendra, Ephicacy Consulting Group

In today's age of cloud computing and AI, it is hard to imagine the end-to-end drug development process done without the use of various tools employed at every stage. The last decade has seen a surge in the number of tools and applications that have come up to serve the pharmaceutical industry, improving the efficiency of the drug development process and shortening time to market. These tools cover the entire gamut ranging from data capture systems to reporting wizards to automated validators to SCEs and MDRs to AI-based SDTM mappers. Like any other application, the implementation of an SCE or MDR for an organization is a very delicate process and it offers an opportunity for the organization to re-think the way they get things done. While having a fundamentally clear purpose, SCEs and MDRs also need to offer a large amount of flexibility to the organization that adopts them in order to ensure a seamless transition for their teams. The choice of which features of the application an organization should use decides the return on investment for the organization and this should also be supported by the product that has been chosen. To ensure a smooth implementation, it is important that the business analysis team and the product design team work closely with the organization to provide a product that is customizable to their requirement but yet has a solid roadmap.

SI-452 : Embracing Diversity in Statistical Computing Environments: A Multi-Language Approach
Amit Javkhedkar, Ephicacy
Sridhar Vijendra, Ephicacy Consulting Group

In the transition to Statistical Computing Environments (SCEs), organizations face the challenge of supporting multiple programming languages, including SAS, R, Python, and Julia. Accommodating these diverse languages becomes complex, particularly due to variations in open-source libraries and their versions. This paper explores the feasibility of seamlessly integrating multiple versions of various programming languages within a single platform. It delves into the advantages, limitations, and potential strategies to enhance the experience for the key users in statistical programming teams. Specifically, we examine whether there exists a straightforward solution to this complex task, ultimately aiming to provide insights into optimizing programming and application development in open-source languages for statistical programmers.

SI-461 : Oncology ADaM Datasets Creation Using R Programming: A Comprehensive Approach
Ishwar Chouhan, Atorus Research

Efficient analysis of clinical trial data is crucial in advancing cancer treatment strategies. This presentation showcases using R programming and packages, such as admiralonco, as a comprehensive approach to creating ADaM datasets for oncology studies. This methodology spans initial data preparation to submission-ready datasets, ensuring compliance with industry standards and regulatory guidelines. We will cover pertinent areas such as data extraction and cleaning; variable mapping and standardization; ADaM dataset creation and variable derivation; meta-data documentation; quality control and validation, and version control and reproducibility. This systematic and transparent approach aims to enable you to create ADaM datasets, using R programming, consistently, accurately, and compliantly for the analysis of clinical trial data in the oncology domain.

Submission Standards

SS-132 : BIMO Brilliance: Your Path to Compliance Resilience
Jai Deep Mittapalli, Seagen Inc.
Girish Kankipati, Seagen Inc

The FDA's Bioresearch Monitoring (BIMO) program is pivotal in upholding the integrity of data generated in clinical trials and post-marketing surveillance of drug products, safeguarding the rights and safety of human subjects. When filing an NDA or BLA with the FDA, sponsors must include information in the form of participant data listings, site-level summary data sets, and associated metadata as part of the electronic submission package. This paper delves into the critical aspects of such BIMO submissions and the related technical conformance guide, to elucidate the key components the FDA expects in a BIMO package, the significance of the CLINSITE data set, and the four essential listings that must be part of the submission. Furthermore, this paper explores the structural and formatting requirements and compliance standards of a BIMO package. Drawing from the authors' recent BIMO submission experience, this paper offers insights to aid in understanding and preparing for future BIMO submissions. The exploration of BIMO inspection by the FDA extends to the crucial aspect of Financial Disclosure, a fundamental component of regulatory compliance in clinical research. The Financial Disclosure template we include in this paper is an indispensable tool for research teams. By encouraging teams to collect and document financial data in advance, the template streamlines preparation, minimizes potential delays, and enhances overall efficiency by helping teams avoid blind spots in data received or collected. A macro that we have developed to seamlessly combine individual subject-level data listing, creating a singular PDF with bookmarks and comprehensive pagination is included.

SS-133 : Cultivating Success with Non-standard Investigator-sponsored Trial Data for FDA Submissions
Jai Deep Mittapalli, Seagen Inc.
Jinit Mistry, Seattle genetics
Venkatesulu Salla, Seagen

From the moment of extraction of raw data all the way through creating analysis reports, a study programming team goes through multiple steps such as checking the quality of raw data, mapping and programming SDTM and ADaM data sets, etc. A study lead programmer is responsible for all these steps while managing the programming team's assignments and activities, understanding the requirements given by Biostatisticians, working with cross-functional teams, finalizing the scope of the analysis, managing external data, integrations, and so on. Things become even more complex when study deliverables are submitted to regulatory agencies. If the study in question happens to be an investigator-sponsored trial (IST) rather than an in-house trial, the plot thickens: such trials are designed more to suit academic needs for publication than regulatory filings for commercialization and are often conducted without strict adherence to CDISC standards and without collecting the typical spectrum of safety and efficacy data needed to support a submission. This paper details key insights on how to effectively manage the preparation and regulatory submission of ISTs. We will share areas of focus in IST study documents such as protocols, CRFs, and data files, as well as tips on reviewing raw data, creating quality submission-ready SDTM and ADaM data sets based on such non-standard IST data, documenting all important decisions, and several challenges ISTs might present. This paper will benefit study leads handling one or more ISTs with tight timelines for regulatory submission.

SS-137 : Study Start Date - Let's Get it Right!
David Izard, Merck

The US FDA incorporated a clear definition of Study Start Date into binding guidance in December 2014 and have since reinforced it with language added to the Study Data Technical Conformance Guide. Despite these efforts, Sponsors and Service Providers struggle to correctly and consistently establish Study Start Date, leaving them open to Technical Rejection Criteria firing and standards compliance issues during regulatory review. This paper and highly interactive presentation will clearly establish what Study Start Date is and isn't based on regulatory references, elucidate why it is a data driven study metadata element, discuss the impact of not setting the Study Start Date correctly across your submission elements, and consider similar concepts for filings to other regulatory authorities.

SS-213 : Is a Participation-Level ADaM Dataset a Solution for Submitting Integration Data to FDA?
Sandra Minjoe, ICON PLC

In 2019, the CDISC ADaM team was prepared to publish an Integration solution that allowed for an "integrated ADSL" dataset structured as one record per subject per pool. This ADaM Integration document was never published because it was determined to be challenging for FDA to perform a review when their tools depend on a one-record-per-subject ADSL. There is still no CDISC solution for Integration. In 2022, the CDISC SDS team proposed a standard for handling DM content when a subject participates in a study more than once. It kept DM as one record per subject, allowing the sponsor to determine which of the multiple participations to put into DM, plus added proposed standard domain DC, with the same variables as DM, but structured as one record per subject per participation. The proposed DM/DC solution prompted the CDISC ADaM team to consider a similar ADaM solution: keeping ADSL plus adding a dataset such as ADPL (Participation-Level Analysis Dataset) that contains the same variables as ADSL but structured as one record per subject per participation. A big takeaway is that this solution doesn't break any ADaM rules and can be used now. Can ADPL or something similar be used for integration?

SS-263 : Creating Adverse Event Tables using R and SASSY System
Vicky Yuan, Incyte Coperation

Currently, the pharmaceutical industry is exploring the adoption of the R language for the production of submission packages and TLF generation. Adverse events summary table and adverse events system organ class and preferred term tables are common used for submission. This paper will illustrate how to create these AE tables using R and packages from the SASSY system. The examples are interesting because they will make R easier for SAS programmers.

SS-285 : Submission Requirements and Magnification of Differences among Regulatory Authorities
Vishwateja Maduri, Tech-Observer UK Ltd
Purushotham Namburi, Tech-Observer UK Ltd
Prashanth Kasarla, Tech-Observer UK Ltd

Globally there are multiple clinical research ongoing in the pharmaceutical industry for new drug research and development. For the approval of the drug, the clinical data should be submitted to the regulatory authorities like FDA (Food and Drug Administration), PMDA (Pharmaceuticals and Medical Devices Agency), EMA (European Medical Agency) and NMPA (National Medical Pharmaceutical Agency). New directives and regulations are increasingly posing real-time challenges for sponsors/organizations to effectively navigate the complex submissions landscape. These regulatory authorities play a pivotal role in approving the newly investigational drugs. The e-CTD (electronic Common Technical Document) is the standard format to submit the clinical data package to the regulatory authorities. There are standard guidelines respective to individual authority for e-submission of clinical data. As well there are certain differences in submission package and their requirements. Sponsors/Research organizations need to adhere to the process of submitting the data and meet the necessary requirements of each regulatory authority. This e-Poster elucidates the submission requirements and the differences between submission requirements for the FDA, PMDA, EMA and NMPA.

SS-290 : Combine PDFs in Submission-ready Format Quick and Easy
Robin Wu, PTC Therapeutics
Lili Li, PTC Therapeutics
Steven Huang, PTC Therapeutics

Portable Document Format (PDF) is an essential component in the regulatory submission package (U. S. Food and Drug Administration, 2013; International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use, 2017). The conventional approaches of combining PDFs include using a SAS macro or utilizing a third-party stand-alone tool. Both approaches are usually time-consuming, unstable, and technically challenging in real practices. In the meantime, however, very limited research and utilities have been shared in the SAS community on how to combine PDFs properly and efficiently in a submission-ready format. This paper will address the gap here by introducing a novel approach that implements both a SAS macro and two third-party utilities (a Visual Basic application and a Java application iText) to make the PDF combination quick, easy, and submission-ready.

SS-306 : Leveraging SAS and Adobe Plug-in for CRF Bookmark Generation (Rave studies)
Swaroop Neelapu, Algorics, Inc

Annotated CRF (acrf.pdf) should be dual bookmarked: (1) bookmarks by chronology and (2) bookmarks by forms. Typically, this is a very time-consuming process, if done manually. This paper will discuss automation that can be leveraged to complete this process in 15 minutes irrespective of the number of pages in the CRF or the number of visits in the trial. This is a completely menu-driven process with very minimal use of keyboard. The resulting bookmarks in acrf.pdf are compliant with the standards mentioned in CDISC SDTM Metadata Submission Guidelines version 2.0 (MSG 2.0) and includes dual bookmarking by Visits/Forms, among other features like nested levels, running records, inherit zoom magnification, etc. The prerequisites for using this process are: a) Applicable for Medidata Rave studies b) SAS software c) Adobe Acrobat Pro that should include the capability to export and import bookmarks. There are many 3rd party tools available online for this purpose. For the sake of this presentation, the plug-in software used is from Evermap - "AutoBookmark Professional Plug-in for Adobe Acrobat". This plug-in will install additional menu items in Acrobat for exporting and importing bookmarks from/to a csv file (screenshot shown in attached working draft). (Note: A free version of the plug-in is available for 30-day trial download. Evermap Inc, USA is a certified Technology Partner for Adobe Inc.). Software and versions used: Windows, SAS version 9.4, Adobe Acrobat Pro 2017 (along with Evermap Autobookmark Plug-in menu installed), MS Office 365 Skill level required for use: Beginner Statistical Programmer

SS-311 : How to generate a submission ready ADaM for complex data
Yilan Xu, Clinchoice
Hu Qu, Clinchoice
Tina Wu, Clinchoice

In clinical trial analysis, composite endpoints are quite common in many phase II or phase III studies. Creation of composite endpoint datasets presents many challenges, not only for programming but also for understanding perspective. Some complex endpoints require a combination of individual data (CRF collected) and daily data (ePRO). Furthermore, the daily data contains several criteria which involve complicated calculation rules such as threshold, slope and so on. An intermediate dataset proves ideal in these cases, offering more specific information and facilitating traceability and verifiability to the original data. The intermediate dataset provides a bridge between the multiple sources of original data and the final standard ADaM. This paper will introduce how to create a submission ready BDS data for a composite endpoint by using several intermediate datasets.

SS-333 : Lead-in and extension trials, how we documented datapoint traceability
Hanne Ellehoj, Lundbeck
Veeresh Namburi, Lundbeck

When generating data packages for extension trials, we faced several challenges, and we want to share some of our experiences, considerations, and decisions made. In our case, participants had completed a lead-in trial and continued into extension trial. There were two independent EDC databases build and independent SDTM packages created, reflecting data from for each database. When we studied the protocol/SAP, we realized that some of the datapoints collected in the lead-in trial, were needed in the reporting of the extension trial. This was 1) The last assessed datapoints in lead-in trial should serve as baseline in extension trial. 2) Safety and scale data was to be reported as change from baseline using datapoints from all visits in both lead-in and extension trial. 3) disease characteristics was only collected once in the start of the lead-in trial. The solution chosen was to generate intermediate ADaM datasets with datapoints from lead-in trial. These intermediate ADaM datasets then added as source for ADaM datasets for the extension trial. Our goal was to have good datapoint traceability documented in our submission package; in data and in metadata (define.xml and in ADRG). In this paper, we discuss the challenges and reflections we had, and we share our experience on traceability, which should be both metadata-driven and data-driven.

SS-344 : Piloting into the Future: Publicly available R-based Submissions to the FDA
Benjamin Straub, GlaxoSmithKline

In recent years, statisticians and analysts from both industry and regulatory agencies have increased adoption of open-source software such as R. R brings great benefits from its vibrant open-source community by providing a wealth of cutting-edge statistical tools, extension packages for interactive dashboard and documentation as well as adaptability to the latest data science trends. Particularly, the dashboard focused R package Shiny has shown to provide great flexibility and interactivity. However, publicly available drug submissions with the R language as the core analysis language has been lacking and limits wider adoption. The R Consortium R Submission Work Group seeks to test the concept that a R-based language submission language can be bundled into a submission package and transferred successfully to FDA reviewers. As of May 2024, the R consortium R submissions working group has successfully completed three pilot submissions and received FDA CDER response letters. To our knowledge, these are the first publicly available submission packages that include components of open-source languages. In this talk, I will introduce the R consortium R submission Working Group and the completed Pilot 1, 2 and 3 findings, the issues that we encountered, learnings as well as current work being done in Pilot 4.

SS-363 : A Programmer's Insight into an Alternative to TQT Study Data Submission
Hiba Najeeb, Vertex Pharmaceuticals
Raghavender Ranga, Vertex Pharmaceuticals

We detail the submission process for Phase I concentration-QTc data to regulatory agency, seeking an alternate approach from conducting a separate Thorough QT/QTc (TQT) study. A waiver from the TQT is significant because it saves time by not requiring a separate study, which is critical to getting therapies to patients faster. The submission datasets to support the QT evaluation align with the FDA Technical specifications document for Submitting Clinical Trial Datasets for Evaluation of QT/QTc Interval Prolongation and Proarrhythmic Potential of Drugs and CDISC TAUG-QT standards. The submission package also adheres to the FDA Technical Conformance Guide and QT Evaluation Report Submission checklist. This paper delves into the creation of a submission package utilizing Continuous Holter ECG and PK data from Phase I randomized placebo-controlled dose escalation studies. It details the necessary updates required for the SDTM trial design data sets, along with inclusion of necessary SDTM/ADaM domains that are pertaining to the QT submission and specific to the cohorts included in the cardiac safety analysis. In addition, amendments to the aCRF, define.xml and reviewer's guide will be described in this paper, that provides valuable insights into the intricacies of creating a comprehensive submission package for consideration of an alternate approach to the TQT study.

SS-368 : Design Considerations for ADaM Protocol Deviations Dataset in Vaccine Studies
Rashmi Gundaralahalli Ramesh, Merck and Co.
Jeffrey Lavenberg, Merck & Co., Inc .

Per-protocol analysis of immunogenicity data in vaccine trials is essential for evaluating efficacy. The ADaM dataset for protocol deviations (ADPDEV) is critical in excluding samples with clinically important protocol deviations from the analysis. However, challenges arise in determining the impact of deviations in studies with multiple doses and concomitant vaccinations, which require different exclusion rules. For instance, the same deviation term may lead to participant level exclusions or visit level exclusions. If there are multiple concomitant vaccinations, then each vaccination may follow a different rule to qualify for exclusion at each visit. This paper focuses on creating ADPDEV dataset in vaccine studies, specifically addressing studies with multiple doses and/or concomitant vaccinations. We present a systematic approach for determining sample exclusions based on the nature and timing of protocol deviations. The list of protocol deviations received from the clinical team is used as input for the SDTM deviations (DV) dataset, which in turn is used to create ADPDEV. By following the ADaM Occurrence Data Structure (OCCDS) structure, the ADPDEV dataset is constructed to capture one record per subject, per violation, per timepoint, per assay. To accurately identify immunogenicity results corresponding to protocol deviations, we merge the ADPDEV dataset with the analysis dataset for immunogenicity (ADIMM) and derive a per-protocol record level flag (PPROTRFL). Our findings emphasize the importance of defining the exclusion rules based on the protocol deviations and timing. We demonstrate the programming challenges and significance of accurately constructing the ADPDEV dataset to ensure robust per-protocol analysis in vaccine studies.

SS-376 : Experimenting with Containers and webR for Submissions to FDA in the Pilot 4
André Veríssimo, Appsilon
Ismael Rodriguez, Appsilon

Pilot 4 of the R Consortium Submission Working Group marks a pioneering step in using R for FDA clinical trial submissions, introducing novel technologies like WebAssembly and Containers to elevate submission efficiency and effectiveness. This initiative, in collaboration with Appsilon, is revolutionizing the Shiny application packaging and transfer process, establishing a new benchmark that complements the static and extensive documentation. At the heart of Pilot 4's innovation are WebAssembly and Containers. WebAssembly enables R applications to run almost at native speed in web browsers, bypassing complex local installations and greatly enhancing access. This advancement is crucial for more efficient and secure operation of R-based applications, especially under the strict regulations of clinical trials. Containers, notably Podman, allow for the same benefits of WebAssembly in terms of portability with the advantage of being a well-established technology. Podman, in particular, is an open-source tool that places a strong emphasis on security, making it an ideal choice to comply with stringent FDA requirements. Pilot 4 is set to split into two separate submissions, focusing on WebAssembly and Containers, respectively. This approach facilitates a thorough assessment of each technology's capabilities in refining clinical trial submissions. As the R Consortium awaits feedback on previous pilots and moves forward with Pilot 4, this initiative is expected to significantly expedite clinical trial submission practices. Integrating these advanced technologies underlines the Consortium's dedication to deploying R-based methods for more effective and technologically sophisticated submissions in the future.

SS-377 : Challenges and Considerations When Building e-Submission SDTM Data Packages
Wei Duan, Moderna Therapeutics

When it comes to e-Submission SDTM package preparation, there are always plenty of hills to climb. Bring the e-Submission packages making in-house is out of the concern of cost-saving, embodying and reinforcing the agility and ownership for internal programming experts. With helpers like Pinnacle 21 Enterprise (P21 E), which is a day-to-day data-checking and package-making tool that has been employed extensively in IND approved clinical trials, in close collaboration with internal Standard team and Data Management team, programmers are able to keep up to speed with CDISC, TAUG, and standard requirements from regulatory authorities, and ensure compliance of Standard Data Tabulation Model (SDTM), serving as the root of data for reviewers, and a solid groundwork and pillars for subsequent ADaM and TLF workflows. In a recent BLA submission of vaccine studies, SDTM package preparation has not been a simple standard in-and-out process. As the devil is always in the details, in this paper we will go through several noteworthy challenges and considerations lying behind the scenes, along our journey of building the e-Submission SDTM packages.

SS-422 : Submitting Patient-Reported Outcome Data in Cancer Clinical Trials- Guidance for Industry Technical Specifications Document
Flora Mulkey, US. FDA

This presentation will focus on highlights from the new technical specifications guidance for industry on submitting patient-reported outcome (PRO) data collected in cancer clinical trials to support a marketing application for a medical product in oncology (November 2023). Recommendations are meant to foster standardization and consistency as well as traceability and include newly defined variables for datasets relevant to PROs. Guidance on data structure, tables, and figures depicting patient disposition, PRO data completeness, as well as PRO score distributions (cross-sectionally by assessment timepoint and as a change from baseline) will be discussed. Based on the PRO objective (comparative benefit vs. safety and tolerability), differences in specifications for recommended representation of missing data are provided, including data not collected based on logically skipped algorithms or as a result of computerized adaptive testing. Missing data recommendations can be applied to SDTM QS and how this should be carried forward and presented in ADaM ADQS with phantom records will be discussed. Finally, recommended dataset structures in ADaM using PARCATy variables are illustrated based on the type of PRO measure and the number of summary scores calculated. This technical specification document supplements FDA's draft guidance for industry Core Patient-Reported Outcomes in Cancer Clinical Trials (June 2021) and the PFDD Guidance series.


PO-106 : No LEAD Function? Let's Create It!
Xianhua Zeng, Elixir Clinical Research

In data analysis and statistical modeling, there are situations where the ability to access next observations (look-ahead) for specific variables is essential for making comparisons and calculations. Unfortunately, SAS does not provide a built-in LEAD function for this purpose. This paper presents a custom macro-based methodology to create a LEAD function using the PROC FCMP procedure. To access LEAD values in various scenarios, we also introduce a versatile macro called "GetLeadValue". It supports obtaining lead values based on a specific by-group in dataset. The macro code is currently hosted on my Github page:

PO-123 : Enhancing Define-XML Generation: Based on SAS Programming and Pinnacle 21 Community
Kevin Sun, HKU-CTC

This paper introduces a method that uses SAS code to automatically enrich content with more details, ultimately improving the efficiency and comprehensiveness in the generation process of Define-XML. This approach enables SAS programmers to concentrate on the content of Define-XML without necessitating a deep understanding of XML syntax. Intended audience: Professionals familiar with SAS programming (macro) and possessing basic knowledge of Pinnacle 21 Community software. Software products used: SAS 9.4, Pinnacle 21 Community software 4.0.1.

PO-128 : A Deep Dive into Enhancing SAS/GRAPH® and SG Procedural Output with Templates, Styles, Attributes, and Annotation
Louise Hadden, Abt Associates Inc.

Enhancing output from SAS/GRAPH® has been the subject of many a SAS® paper over the years, including my own, some written with co-authors. The more recent graphic output from the SG procedures is often "camera-ready" without any user intervention, but occasionally there is a need for additional customization. SAS/GRAPH is a separate SAS product for which a specific license is required, while the SG procedures and the Graph Template Language are available to all BASE SAS users. This e-poster will explore both new opportunities within BASE SAS for creating remarkable graphic output as well as creating visualizations with SAS/GRAPH. Techniques in SAS/GRAPH, SG procedures and GTL such as PROC TEMPLATE, PROC GREPLAY, PROC SGRENDER, and GTL, SAS-provided annotation macros and the concept of "ATTRS" in SG procedures will be explored, compared, and contrasted. As background, a discussion of the evolution of SG procedures and the rise of GTL will be provided.

PO-129 : The Survival Mode
Varsha Ganagalla, Innovative Analytics, Inc.
Natalie Johnson, Innovative Analytics, Inc.

Researchers often overlook the impact of disease symptoms, diagnostics, treatments, and outcomes in survival analysis as time remains the ultimate focal point. However, the observation of an event in most subjects might be due to various reasons, to name a few - " the subjects might have been censored from the study due to withdrawal, lost to follow up, erroneous enrollment and/or at the end of the study the event hasn't been observed at all. Censoring makes it difficult to choose the right technique to account for probabilities. Kaplan-Meier survival estimates do a better job at handling censored data by providing survival probabilities and survival curves for groups of interest. On the contrary, it lacks the ability to inform on the significance of difference in probabilities of two or more groups unless specified explicitly using different options. Through this paper/poster, we intend to demonstrate multiple techniques that can be used both in SAS and R, to establish the appropriate assumptions in order to build Kaplan-Meier survival estimates.

PO-145 : Integrity, Please: Three Techniques for One-Step Solution in Pharmaceutical Programming
Jason Su, SAS Inc.

In pharmaceutical world, data are frequently reorganized or summarized before the final tabulation with tables or display with listings. SAS programmers typically use a straightforward method composed of three (3) steps: split the data set into different data sets, process them individually, and merge these data sets back, which I called Split-and-combine method. Despite its intuitive structure, the route comes with computational efficiency issues and requires extra efforts to make sure the data structures are compatible in these data sets for avoidance of undetected erroneous results. Here, I summarized three (3) field-tested methods including Do-loop of Whitlock (DoW), self-interleaving techniques, and direct outputting, to circumvent this method with primarily one (1) DATA step. The new programs usually contain much less steps and more importantly, less or no intermediate data sets, which greatly saves system resources and reduces the later maintenance efforts. The principle is demonstrated with real-world examples and common scenarios for the three techniques summarized.

PO-158 : If its not broke, don't fix it; existing code and the programmers' dilemma
Jayanth Iyengar, Data Systems Consultants LLC

In SAS shops and organizational environments, SAS programmers have the responsibility of working with existing processes and SAS code which projects depend on to produce periodic output and results and meet deadlines. Some programming teams still cling to the old adage; if it not broke, don't fix it. They've come to depend on code which runs clean, and is reliable. However, besides processing with no errors and warnings. there other criteria to judge the quality of a SAS program. Programming guidelines dictate that code should be well-documented, readable, and efficient, and conform to best practices. This paper challenges the conventional wisdom that code which works shouldn't be modified.

PO-194 : Updates on Preparing a BIMO Data Package
Elizabeth Li, PharmaStat, LLC
Carl Chesbrough, PharmaStat, LLC
Inka Leprince, PharmaStat, LLC

Bioresearch Monitoring Program (BIMO) packages are used as part of FDA CDER submissions for the planning of BIMO inspections. These BIMO packages aid the review of electronic submission of New Drug Application (NDA) or Biologic License Application (BLA) content. In a 2020 PharmaSUG paper, titled "Preparing A Successful BIMO Data Package", we presented a process of producing BIMO information, particularly the subject-level data line listings by clinical site (by-site listings) and the summary-level clinical site (CLINSITE) dataset, along with the reviewer's guide and define.xml for CLINSITE. Since then, the FDA updated the technical conformance guide on August 11, 2022. This paper presents updates on how we implement and prepare electronic Common Technical Document (eCTD) documentation using the updated guidance document. The focus is on addressing challenges encountered and solutions developed during the preparation of a successful BIMO package. This paper aims to assist stakeholders in navigating the evolving regulatory landscape, ensuring compliance, and enhancing the efficiency of BIMO inspection within the framework of the updated FDA guidance.

PO-231 : Best Function Ever: PROC FCMP
Michael Stout, Johnson & Johnson Medical Device Companies

Base SAS provides a wealth of character and numeric functions. User defined functions can be created with PROC FCMP (SAS Function Compiler). PROC FCMP is a powerful tool that can invoke SAS procedures and the output delivery system and then return a single value to the calling program. This paper will show how to write and use user defined functions created using the special SAS function RUN_MACRO. Now you can create the best functions ever and deploy them in your SAS environment.

PO-258 : An approach to make Data Validation and Reporting tool using R Shiny for Clinica Data Validation
Madhavi Gundu, Ephicacy
Vivek Jayesh Mandaliya, Ephicacy Lifesciences

Data validation in clinical trials plays a critical role in ensuring the integrity, reliability, and validity of the data collected during the study. Clinical trials are essential for evaluating the safety and efficacy of new drugs, medical devices and results of these trials can have significant implications for patient care, regulatory approvals, and public health. Traditionally, data validation has relied heavily on proprietary software such as SAS to generate reports, which may come with limitations in terms of flexibility and accessibility. With the advent of Open-source tools like R and Python, we have developed a Data validation tool using R Shiny. The tool introduces a dynamic user-friendly interface with the option to select data points and logic blocks that a user can customize per the validation criteria and generate reports without any programming skills. Our tool empowers Data management teams to conduct efficient and accurate data validation.

PO-292 : Elevate Your Game: Leveling Up SDTM Validation with the Magic of Data Managers
Julie Ann Hood, Pinnacle 21
Jennifer Manzi, Pinnacle 21

Selecting players armed with unique abilities to collaborate with is the key to crafting an unbeatable strategy while on the field, navigating a quest, or in the office. In the realm of standardized clinical trial data, not only can this break down departmental siloes, but it can also enhance the quality of study data, leading to the availability of more effective, efficient treatments earlier. Historically, due to the timing and content of SDTM validation, many organizations enlisted SDTM programmers with unraveling issues in these reports. However, there are some rules that require tracing SDTM data back to raw data or data collection to optimize decision-making. Seamlessly integrating data managers into this process can help slash time to resolution of these types of issues as well as boost overall data quality. This poster will provide important considerations and guidance on weaving data managers into the SDTM validation process. Unveiling different types of workflows will illustrate how to level up your approach to resolving validation issues. Curating appropriate FDA validation rules along with detailed examples will showcase how these are best served by the unique positioning of data managers. Lastly, suggested training as well as ideas to power-up your data managers will equip you to battle issues at the source and conquer those data demons.

PO-299 : Upholding Blinding in Clinical Trials: Strategies and Considerations for Minimizing Bias
Chen Li, Boehringer Ingelheim
Hong Wang, Boehringer Ingelheim
Ke Xiao, Boehringer Ingelheim

Randomized controlled trials are gold standard for effectiveness research, and preserving data integrity throughout the trial conduct is crucial to prevent potential statistical and operational bias. However, there are scenarios where some unblinding analysis may be necessary during the trial conducts, such as IND safety reporting that requires aggregate analyses for the FDA, Data Monitoring Committee (DMC), or Interim Analysis (IA), among others. Therefore, sponsors must meticulously plan and assess any unblinded activities during the trial conducts, ensuring that the actual treatment allocation information is confined to the personnel group as outlined in the access plan. This poster presents an approach to manage planned/unplanned unblinding activities during conduct. Outline: - Establishing a Firewall (logistics and access plan) o Time frame for maintaining blinding and unblinding required activities o Data transfer (patient-level clinical data, vendor data, and randomization list) o Interaction between the blinded and unblinded teams - Assessing Potential Unblinding Risk o Risk of Adverse Events incidence and subsequent actions to drug administration o Lab results showing varying trends between arms o Efficacy endpoints with significant variation between arms o Differences in dosage regimen (administration frequency, administration route, treatment duration) o Ad-hoc requests from DMC/IA closed discussion - Conducting Unblinding o Independent team: a sponsor-employed, fully unblinded independent team in Global Biostatistics & Data Sciences (gBDS) organization --Independent Statistical Analysis Team (iSAT) o Unblinding treatment information o Masking data with potential unblinding risk o Re-masking of DMC/IA outputs for blinded team review - Advantages of this approach

PO-324 : Plotting Data by US ZIP Code
David Franklin,

Plotting frequency of events by zip code is something that is visually easy to understand rather that a set of tables with numbers. This paper introduces a way of generating such a graphic using the SAS supplied file with zip code for mapping purposes and merging this with data to produce a quality plot. The example given in the paper will be the number of public hospitals across the 50 states of the USA.

PO-407 : Tips and Tricks for using the CMS Platform
Kena Patel, Pfizer
Jingying Zhou, Pfizer

This poster will cover lessons learned as a new user of the Center for Medicare and Medicaid Services (CMS) Virtual Research Data Center (VRDC) platform. Topics covered will include use of R, Python, SQL, and SAS for data analysis in both Databricks and the SAS Enterprise Guide, structure of the CMS data, resources available to programmers through CMS, and tips and tricks learned while working in the CMS VRDC environment.

PO-418 : RWD Exploration through R Shiny
Darren Jeng, Pfizer
Sachin Heerah, Pfizer

This poster aims to explore the capabilities of a R shiny application in examining real world data. R Shiny allows for reporting through dynamic data visualizations, including graphs, charts, and tables. General summary statistics can be reported as an interactive dashboard. Cohorts can also be stratified using r shiny widgets for further examination between subgroups. Overall, this tool can drive decision-making using ad-hoc and on-demand explorations in RWD.

PO-440 : The SAS Genome - Genetic Sequencing
Oliver Lu, Eurofins Viracor
Katie Watson, Eurofins Viracor

After Covid-19 caused a chaotic shake up around the world, more and more pharmaceutical companies are placing more emphasis on genetic sequencing alongside their requested sample testing. Sequencing data outputs from the Laboratory typically contain a gazillion amount of data and various formats, while there might be some customization requests along the way. Depending on the Third Party Vendors associated with a project, you may need to convert some files and organize the data into a comprehensive SAS dataset or report and this is where scenarios may get interesting.

PO-451 : Simplifying Edit Check Configuration
Vandita Tripathi, Ms
Manas Saha, TCS

Clinical Data Management is a pivotal process and without good quality data, there is no accurate statistical analysis. Hence maintaining Data Integrity and quality in the has a critical influence on study success. Edit checks are the first line of defense against data entry errors. They are small programs or algorithms that are used to validate the data to make sure it meets standards/requirements, these programs alert the user that data entry fields have errors. Hence configuring the edit checks correctly is a crucial step in designing eCRF and non eCRF forms. In most eCRF Form designing solutions, configuring the edit checks is a cumbersome process. Often this requires a long time to upskill the edit check programmer. An AI-based solution that can understand Natural language input and translate the same into edit check validation algorithm can solve this problem. The solution will accept a natural language plain English prompt from the form designer specifying the validation requirements, then the AI will translate the validation requirements into software programs/algorithms and will attach the algorithms in the eCRF fields. When a data entry operator fills the form, these algorithms will trigger to detect any discrepancy, and will warn the user automatically. This solution has the potential to greatly reduce the complexity of designing eCRF forms and automating edit checks. This solution does not need the operator to have programing skills, and it can drastically cut down the time and manual effort needed to design eCRF forms.