Paper presentations are the heart of a PharmaSUG conference. Here is the list of the initial confirmed first pass paper selections. Papers are organized into 12 academic sections and cover a variety of topics and experience levels. This list will be updated once all of the paper selections have been finalized.

Note: This information is subject to change. Last updated 22-Jan-2023.


Advanced Programming

Paper No. Author(s) Paper Title (click for abstract)
AP-021 Richann Watson Have a Date with ISO®? Using PROC FCMP to Convert Dates to ISO 8601
AP-026 Stephen Sloan
& Kirk Paul Lafler
A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
AP-033 Stephen Sloan Twenty Ways to Run Your SAS® Program Faster and Use Less Space
AP-039 Wei Shao A utility to combine study outputs into all-in-one PDF for DSUR
AP-048 Philip Mason Documenting your SAS programs with Doxygen and automatically generated diagrams.
AP-049 Matthew Slaughter
& Isaiah Lankham
Friends are better with Everything: A User's Guide to PROC FCMP Python Objects in Base SAS
AP-061 Richann Watson
& Louise Hadden
Going Command(o): Power(Shell)ing Through Your Workload
AP-063 Laura Elliott
& Crystal Cheng
RESTful Thinking: Using R Shiny and Python to streamline REST API requests and visualize REST API responses
AP-079 Bruce Gilsen Using the R interface in SAS ® to Call R Functions and Transfer Data
AP-086 Brian Varney An Introduction to Obtaining Test Statistics and P-Values from SAS® and R for Clinical Reporting
AP-094 Troy Hughes Sorting a Bajillion Variables—When SORTC and SORTN Subroutines Have Stopped Satisfying, User-Defined PROC FCMP Subroutines Can Leverage the Hash Object to Reorder Limitless Arrays

Data Standards

Paper No. Author(s) Paper Title (click for abstract)
DS-035 Nancy Brucken
& David Neubauer
& Soumya Rajesh
Sound SDTM, Sound ADaM – Orchestrating SDTM and ADaM Harmonization
DS-036 Clio Wu Why FDA Medical Queries (FMQs) for Adverse Events of Special Interest? Implementation and Case Study
DS-041 Xiangchen Cui Leverage and Enhance CDISC TAUGs to Build More Traceability for and Streamline Development of Efficacy ADaM in Oncology Studies
DS-051 Anastasiia Drach The Phantom of the ADaM: adding missing records to BDS datasets
DS-054 Nadiia Pukhliar SDTM Variables You Might Forget About
DS-114 Ajay Gupta CDISC SDTM IG v3.4: Subject Visits
DS-129 Kaleigh Ragan
& Richann Watson
Have Meaningful Relationships: An Example of Implementing SDTM Domain RELREC with a Many-to-Many Relationship

Data Visualization and Reporting

Paper No. Author(s) Paper Title (click for abstract)
DV-022 Madhura Nagarkar Ggplotly – A Powerful Tool to bring Static Plots to Life
DV-024 Jeffrey Meyers Methods of a Fully Automated CONSORT Diagram Macro %CONSORT
DV-111 Ilya Krivelevich
& Cixin He
& Binbin Zhang-Wiener
& Wenyin Lin
Enhanced Spider Plot in Oncology
DV-134 Abhinav Srivastva Life Table Analysis for Time to First Event Onset

Leadership Skills

Paper No. Author(s) Paper Title (click for abstract)
LS-004 Daryna Yaremchuk Are you a great team player?
LS-028 Stephen Sloan Developing and running an in-house SAS Users Group
LS-038 Laura Needleman The Interview Process: An Autistic Perspective
LS-056 Josh Horstman
& Richann Watson
Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
LS-095 Carey Smoak Lessons Learned from a Retired SAS® Programmer

Metadata Management

Paper No. Author(s) Paper Title (click for abstract)
MM-118 Michael Hagendoorn
& Ran Li
& Mimi Vigil
& Shan Yang
Masters of the Table Universe: Creating Table Shells Consistently and Efficiently Across All Studies

Quick Programming Tips

Paper No. Author(s) Paper Title (click for abstract)
QT-030 Stephen Sloan Running Parts of a SAS Program while Preserving the Entire Program
QT-046 Brad Danner
& Indrani Sarkar
Repetitive Analyses in SAS® – Use of Macros Versus Data Inflation and BY Group Processing
QT-123 Lauren Rackley Introducing a QC Checklist for High Quality TLFs

Real World Evidence and Big Data

Paper No. Author(s) Paper Title (click for abstract)
RW-050 Matthew Slaughter
& Denis Nyongesa
& John Dickerson
& Jennifer Kuntz
Real World Evidence in Distributed Data Networks: Lessons from a Post-Marketing Safety Study

Solution Development

Paper No. Author(s) Paper Title (click for abstract)
SD-069 Yunxia Sui
& Xianwei Bu
Application of Tipping Point Analysis in Clinical Trials using the Multiple Imputation Procedure in SAS
SD-070 Menaga Guruswamy Ponnupandy Do it the smart way, renumber with PowerShell scripts!
SD-084 Chao Su
& Jaime Yan
& Changhong Shi
A Macro Utility for CDISC Datasets Cross Checking
SD-098 Jenny Zhang
& Shunbing Zhao
Challenges of Developing Microbiology Dataset
SD-103 Jeff Xia
& Chandana Sudini
A SAS Macro to Perform Consistency Check in CSR Footnote References
SD-109 Yogesh Pande
& Donna Hyatt
& Brandy Cahill
Importance of Creating a Learning Portal for Statistical Programming End-to-End Processes
SD-122 Huei-Ling Chen
& Heng Zhou
& Nan Xiao
Building an Internal R Package for Statistical Analysis and Reporting in Clinical Trials: A SAS User's Perspective

Statistics and Analytics

Paper No. Author(s) Paper Title (click for abstract)
SA-068 Martin Karpefors
& Srivathsa Ravikiran
& Samvel Gasparyan
Validating novel maraca plots – R and SAS love story

Strategic Implementation & Innovation

Paper No. Author(s) Paper Title (click for abstract)
ST-005 Lucy Dai What's the story in your subgroup analysis
ST-006 Piyush Singh Digital Data Flow (DDF) and Technological Solution Providers
ST-010 Srinivasa Rao Mandava Key required infrastructure for digital transformation in pharmaceutical clinical data operations
ST-102 James Zhao
& Hong Qi
& Mary Varughese
Key Statistical Programming Considerations in External Collaborative Clinical Trials
ST-106 Liqiang He Automation of Dataset Programming Based on Dataset Specification

Submission Standards

Paper No. Author(s) Paper Title (click for abstract)
SS-003 Joe Xi
& Yuanyuan Liu
Handling CRS/NT Data in CAR-T Studies and Submission
SS-045 Jennifer Manzi
& Julie Ann Hood
Optimizing Efficiency and Improving Data Quality through Meaningful Custom Fix Tips and Explanations
SS-112 Charumathy Sreeraman Standardization of Reactogenicity Data into Findings
SS-127 Elizabeth Dennis
& Monika Kawohl
& Paul Slagle
Proposal for New ADaM Paired Variables: PARQUAL/PARTYPE


Paper No. Author(s) Paper Title (click for abstract)
EP-018 Philip Johnston
& Julie Ann Hood
Visualization for Success: Driving KPIs and Organizational Process Improvements via Portfolio-level Analytics
EP-089 Danny Hsu
& Shreya Chakraborty
Creating a Centralized Controlled Terminology Mapping Repository


Advanced Programming

AP-021 : Have a Date with ISO®? Using PROC FCMP to Convert Dates to ISO 8601
Richann Watson, DataRich Consulting

Programmers frequently have to deal with dates and date formats. At times, determining whether a date is in a day-month or month-day format can leave us confounded. Clinical Data Interchange Standards Consortium (CDISC) has implemented the use of the International Organization for Standardization (ISO) format, ISO® 8601, for datetimes in SDTM domains, to alleviate the confusion. However, converting "datetimes” from the raw data source to the ISO 8601 format is no picnic. While SAS® has many different functions and CALL subroutines, there is not a single magic function to take raw datetimes and convert them to ISO 8601. Fortunately, SAS allows us to create our own custom functions and subroutines. This paper illustrates the process of building a custom function with custom subroutines that takes raw datetimes in various states of completeness and converts them to the proper ISO 8601 format.

AP-026 : A Quick Look at Fuzzy Matching Programming Techniques Using SAS® Software
Stephen Sloan, Accenture
Kirk Paul Lafler, sasNerd

Data comes in all forms, shapes, sizes and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. Two data files can be joined without a problem when they have identifiers with unique values. However, many files do not have unique identifiers, or "keys”, and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates data sets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or "messy” data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.

AP-033 : Twenty Ways to Run Your SAS® Program Faster and Use Less Space
Stephen Sloan, Accenture

When we run SAS® programs that use large amounts of data or have complicated algorithms, we often are frustrated by the amount of time it takes for the programs to run and by the large amount of space required for the program to run to completion. Even experienced SAS programmers sometimes run into this situation, perhaps through the need to produce results quickly, through a change in the data source, through inheriting someone else's programs, or for some other reason. This paper outlines twenty techniques that can reduce the time and space required for a program without requiring an extended period of time for the modifications. The twenty techniques are a mixture of space-saving and time-saving techniques, and many are a combination of the two approaches. They do not require advanced knowledge of SAS, only a reasonable familiarity with Base SAS® and a willingness to delve into the details of the programs. By applying some or all of these techniques, people can gain significant reductions in the space used by their programs and the time it takes them to run. The two concerns are often linked, as programs that require large amounts of space often require more paging to use the available space, and that increases the run time for these programs.

AP-039 : A utility to combine study outputs into all-in-one PDF for DSUR
Wei Shao, Bristol Myers Squibb

During the clinical development, periodic analysis of safety information is crucial for the assessment of risk to trial participants, because it is important information for health authorities (HAs) to evaluate the safety profile of the investigational drug on a regular basis. As one of important safety aggregate reports, the development safety update report (DSUR) provides a periodic update on the drug safety information. The statistical programming teams are often responsible for generating cumulative and within reporting period outputs for supporting DSUR. This paper focuses on the two regional listings which are provided to DSUR regional specific. Further the paper introduces a SAS macro that generates consolidated PDF files from sets of individual study SAS listings. The resulting PDF file(s) contains all the converted listings with self-extracted and properly sorted bookmarks. The macro package turns individual listings in a study into one or more submission ready compliant (SRC) PDF files for DSUR submission.

AP-048 : Documenting your SAS programs with Doxygen and automatically generated diagrams.
Philip Mason, Wood Street Consultants

Doxygen has been used to document programs for over 25 years. It involves using tags in comments to generate HTML, RTF, PDF and other forms of high quality documentation. It supports the DOT language for making diagrams from simple text directives. PROC SCAPROC can be used to generate a trace of a SAS® programs' execution. My SAS® code can then analyse the trace and produce DOT language directives to make a diagram of the execution of that SAS program. Those directives can then be put into the Doxygen tags to add the diagram to your documentation. And the analysis can also show the performance of the SAS program in order to be used for tuning purposes. This paper shows how to use Doxygen with SAS, and provides the code to automatically produce diagrams for that documentation or tuning purposes.

AP-049 : Friends are better with Everything: A User's Guide to PROC FCMP Python Objects in Base SAS
Matthew Slaughter, Kaiser Permanente Center for Health Research
Isaiah Lankham, University of California Office of the President

Flexibly combining the strengths of SAS and Python allows programmers to choose the best tool for the job and encourages programmers working in different languages to share code and collaborate. Incorporating Python into everyday SAS code opens up SAS users to extensive libraries developed and maintained by the open-source Python community. The Python object in PROC FCMP embeds Python functions within SAS code, passing parameters and code to the Python interpreter and returning the results to SAS. User-defined SAS functions or call routines executing Python code can be called from the DATA step or any context where built-in SAS functions and routines are available. This paper provides an overview of the syntax of FCMP Python objects and practical examples of useful applications incorporating Python functions into SAS processes. For example, we will demonstrate incorporating Python packages into SAS code for leveraging complex API calls such as validating email addresses, geocoding street addresses, and importing a YAML file from the web into SAS. All examples from this paper are available at

AP-061 : Going Command(o): Power(Shell)ing Through Your Workload
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

Simplifying and streamlining workflows is a common goal of most programmers. The most powerful and efficient solutions may require practitioners to step outside of normal operating procedures, outside their comfort zone. Programmers need to be open to finding new (or old) techniques to achieve efficiency and elegance goals: SAS® by itself may not provide the best solutions for such challenges as ensuring that batch submits preserve appropriate log and lst files, has the ability to archive projects and folders, and unzip files. In order to adhere to such goals as efficiency and portability, there may be times when it is necessary to utilize other resources, especially if colleagues may need to perform these tasks without the use of SAS software. These data management tasks may be performed via the use of tools such as the command-line interpreter (including DOS and Linux) and Windows PowerShell if available to users, both used externally and within SAS software sessions. We will briefly discuss additional tools that are used in conjunction with command line interpreters and PowerShell for our examples, such as the WINZIP command line interface.

AP-063 : RESTful Thinking: Using R Shiny and Python to streamline REST API requests and visualize REST API responses
Laura Elliott, SAS Institute Inc.
Crystal Cheng, SAS Institute Inc.

REST APIs are a popular way to make HTTP requests to access and use data due to their simplicity. They can be used to carry out several types of actions in a statistical computing environment. Even though they are simple, some limitations have been observed such as lack of detail in responses, difficulty in debugging the failure of certain actions, and execution requires the user to have some basic HTTP request knowledge. This research focuses on mitigating these limitations by utilizing the strengths of both R and Python to build a user interface that executes REST APIs easily and displays responses with more detail. R Shiny was used to create an easy-to-use interface that contains several embedded HTTP requests, written using the Python requests module, that can be easily executed regardless of a user's previous knowledge. These requests perform specified actions in a statistical computing environment and return detailed results to be viewable in the R Shiny dashboard. This paper will explain the concept and build process of the dashboard, will discuss techniques used to integrate R and Python programming languages, and will introduce the resulting dashboard. In the end the paper will discuss challenges faced during development and some considerations for the future enhancement of REST APIs. The products used for development include R and Python programming languages and the statistical computing environment SAS Life Sciences Analytics Framework. This paper is intended for individuals with R and Python experience, and those who have knowledge of REST APIs.

AP-079 : Using the R interface in SAS ® to Call R Functions and Transfer Data
Bruce Gilsen, Federal Reserve Board of Governors

Starting in SAS ® 9.3, the R interface enables SAS users on Windows and Linux who license SAS/IML ® software to call R functions and transfer data between SAS and R from within SAS. Potential users include SAS/IML users and other SAS users who can use PROC IML just as a wrapper to transfer data between SAS and R and call R functions. This paper provides a basic introduction and some simple examples. The focus is on SAS users who are not PROC IML users, but who want to take advantage of the R interface.

AP-086 : An Introduction to Obtaining Test Statistics and P-Values from SAS® and R for Clinical Reporting
Brian Varney, Experis

Getting values of test statistics and p-values out of SAS® and R is quite easy in each of the software packages but also quite different from each other. This paper intends to compare and contrast the SAS and R methods for obtaining these values from tests involving Chi-Square and Linear Models such that they can be leveraged in tables, listings, and figures. This paper will include but not be limited to the following topics: * SAS ODS trace * SAS PROC FREQ * SAS PROC GLM * R stats::chisq.test() function * R stats::aov() function * R sasLM package functions * R broom package functions The audience for this paper is intended to be programmers familiar with SAS and R but not necessarily at an advanced level.

AP-094 : Sorting a Bajillion Variables—When SORTC and SORTN Subroutines Have Stopped Satisfying, User-Defined PROC FCMP Subroutines Can Leverage the Hash Object to Reorder Limitless Arrays
Troy Hughes, Datmesis Analytics

The SORTC and SORTN subroutines sort character and numeric data, respectively. These subroutines are sometimes referred to as "horizontal sorts” because they sort variables or values as opposed to observations. Thus, all elements within a SORTC or SORTN sort must be maintained in a single observation. A significant limitation of SORTC and SORTN is their inability to sort more than 800 variables. To overcome this arbitrary threshold, user-defined subroutines can be engineered that leverage the hash object to sort limitless variables. The hash object natively orders values that are ingested into it (when the ORDERED argument specifies ASCENDING or DESCENDING), and once populated, the sorted results are exported to an array that is returned to the DATA step. These user-defined functions benefit from efficient, in-memory hash operations while expanding the scalability and functionality of built-in sort subroutines.

Data Standards

DS-035 : Sound SDTM, Sound ADaM – Orchestrating SDTM and ADaM Harmonization
Nancy Brucken, IQVIA
David Neubauer, IQVIA
Soumya Rajesh, IQVIA

Sound SDTM data is integral to having sound ADaM data. The ADaM model says "Whereas ADaM is optimized to support data derivation and analysis, CDISC's Study Data Tabulation Model (SDTM) is optimized to support data tabulation”. Often times those who implement SDTM are not implementing ADaM and vice versa and they may not be working in harmony. If how the data will be analyzed has not been considered, tasks such as the definition of treatment arms and elements, and the assignment of collected data to SDTM domains can present various challenges during analysis and with traceability. This paper will cover some of the situations the authors have encountered and discuss how SDTM and ADaM implementation can be better in tune.

DS-036 : Why FDA Medical Queries (FMQs) for Adverse Events of Special Interest? Implementation and Case Study
Clio Wu, Chinook Therapeutics Inc.

There are many challenges associated with safety analyses and reporting of adverse events in clinical trials, including, but not limited to, study design issue, coding of the AEs, selection of the AEs of special interest (AESIs), inadequate grouping of likely or potential related AEs, events present in different ways or are reported with different terms, or AEs that are too specific can result in underestimation of an event. To standardize the NDA/BLA safety data review process, the U.S. FDA/CDER has published two documents on 05 September 2022 and collaborated with the Duke-Margolis Center for Health Policy to host a public workshop on 14 September 2022 to introduce the FDA Medical Queries (FMQs) and Standard Safety Tables and Figures Integrated Guide. The author has actively reviewed and promoted the implementation of FMQs at the author company to resolve AESIs issue of un-identifiable legacy studies defined Customized MedDRA Queries (CMQs), that led to the official implementation of this newly released AE grouping. This paper will share the experience of promoting and implementing FMQs, evaluating FDA published FMQ docket for potential issues and providing feedback to enhance future releases. Developing of efficient standardized end-to-end FMQ data pulling, AESIs data analysis and reporting processes. Incorporating Standardized MedDRA Queries (SMQs), FMQs, along with potential company defined CMQs to standardize medical monitoring process to ensure the consistent implementation within company itself. The paper will also share an FMQ case study for NDA ISS analysis and CSR reporting.

DS-041 : Leverage and Enhance CDISC TAUGs to Build More Traceability for and Streamline Development of Efficacy ADaM in Oncology Studies
Xiangchen Cui, Crisprtx Therapeutics

CDISC Breast Cancer Therapeutic Area User Guide and Prostate Cancer Therapeutic Area User Guide presented ‘ADEVENT' and ‘ADDATES', independently in 2016, and 2017. One of primary reasons for the creation of the intermediate datasets is to support traceability by building into event dataset and/or date dataset through the triplet of SRCDOM, SRCVAR, SRCSEQ variables, and all potential dates from them are used as inputs to generate a Time-to-Event (TTE) analysis. ADEVENT can also support another analysis dataset ‘ADRESP' for best overall response, etc. The derivation of dates from tumor assessments is not straight forward and much more complex, especially when Response Evaluation Criteria in Solid Tumors (RECIST 1.1) is applied in the derivation, where the confirmation of a complete response (CR) and partial response (PR) is required. Hence the traceability of these derivations is also very critical to build the confidence of the analysis. The triplet from ADEVENT is not sufficient for the traceability of the events derived from tumor assessments due to the complexity of the derivation. The triplet from ADDATES only provides the traceability of the derivation of the dates of independent of tumor assessments. This paper explains the pros and cons of them and introduces a new approach to enhance them so that they can be broadly used to other areas of oncology studies to build more traceability, further streamline the development of efficacy datasets: ADEVENT, ADRESP, ADDATES, and ADTTE for both categorical analysis of tumor response and a TTE analysis and follow the best programming practice.

DS-051 : The Phantom of the ADaM: adding missing records to BDS datasets
Anastasiia Drach, Intego Group LLC

Missing data is a ‘pain' of any study. There are many imputation techniques available, but sometimes all we need to know is just that the data is missing. In these cases, it is useful to add derived records to your ADaM datasets with missing AVAL/AVALC to indicate missed visits or timepoints. Such records are called phantom records. In this paper, we discuss how to add them into BDS ADaM dataset using PRO data as an example. We will start with an overview of different ways to represent missing data in SDTM. The paper will present several types of analysis which require the inclusion of phantom records to account for missing data. It will cover various scenarios of adding such records, from the most straightforward to more complex ones. Finally, we will provide some ready-to-use solutions for the creation of phantom records, which could be easily adjusted to your individual needs.

DS-054 : SDTM Variables You Might Forget About
Nadiia Pukhliar, Intego Group, LLC.

With every new version of SDTM/SDTM IG more and more examples of raw data mapping are presented, more details on specific variables are described. However, in practice the same mapping rules are transferred from study to study with no changes and everything less common is either mapped to the Supplemental Qualifiers SUPP-- datasets, the Findings About Events or Interventions (FA) domain or is not submitted at all. The paper collects several cases of SDTM mapping providing more coherent and detailed representation of collected data. Special attention in the paper is given to the Supplemental Qualifiers datasets examining standard supplemental qualifiers name codes per the SDTM IG. Further, we are sharing tricks on using ADaM IG to get standard qualifier names in SUPP-- domains. Additional focus of the paper is on using accompanying text in the CRF and the protocol to procure more context in SDTM datasets by creating standard variables from the model that are not described in the implementation guide. The examples provided represent CRF pages and studies from our practice, they are a great testament to the versatility of the SDTM that covers various study and data collection designs.

DS-114 : CDISC SDTM IG v3.4: Subject Visits
Ajay Gupta, Daiichi Sankyo

The Study Data Tabulation Model Implementation Guide for Human Clinical Trials (SDTMIG) Version 3.4 has been prepared by the Submissions Data Standards (SDS) team of the Clinical Data Interchange Standards Consortium (CDISC). Like its predecessors, v3.4 is intended to guide the organization, structure, and format of standard clinical trial tabulation datasets submitted to a regulatory authority. Version 3.4 supersedes all prior versions of the SDTMIG. In this presentation, I will do a quick walk-through on the updates within the SDTM IG v3.4 from his predecessor. Later, I will go over the updated SUBJECT VISITS (SV) with examples e.g., new proposed mapping to include missed visits, how to use additional variables in SV.

DS-129 : Have Meaningful Relationships: An Example of Implementing SDTM Domain RELREC with a Many-to-Many Relationship
Kaleigh Ragan, Crinetics Pharmaceuticals
Richann Watson, DataRich Consulting

The Related Records (RELREC) domain is a tool, provided in the Study Data Tabulation Model (SDTM), for conveying relationships between records housed in different domains. Most SDTM users are familiar with a one-to-one relationship type, where a single record from one domain is related to a single record in a separate domain. Or even a one-to-many relationship type, where a single record from one domain may be related to a group of records in another. But what if there are two groups of records related to one another? How do you properly convey the relationship between these sets of data points? This paper aims to provide a clearer understanding of when and how to utilize the, not often encountered, many-to-many relationship type within the RELREC domain.

Data Visualization and Reporting

DV-022 : Ggplotly – A Powerful Tool to bring Static Plots to Life
Madhura Nagarkar, Syneos Health

Figures play a fundamental role in the clinical trials as it enables thorough analysis of clinical data. Interactive plots have immense potential to convey the key trial outcomes immediately. RStudio has gained popularity amongst data analysts across pharmaceutical industry for statistical analysis and data visualization. Plotly and Ggplot2 are both widely used open-source packages in RStudio. Plotly is a powerful tool for creating interactive graphs, however Ggplot2 is the popular data visualization tool amongst wider data science community. Ggplotly function is a part of the Plotly package which brings static Ggplots to life. This paper mainly focuses on the implementation of Ggplotly function to convert static plots generated by Ggplot2 library to interactive plots, including the layout and style customization options like, controlling tooltip, zooming, panning, enable mouse scrolling, styling hoverlabel. This paper includes a side by side comparison of ggplotly() and plot_ly() functions to create a scatter plot of %change in the target lesion diameter by AVISITN using a dummy ADaM ADTR dataset. These functions are very powerful to display efficacy outcomes through graphs like waterfall plot for tumor response, swimmer plot with time to response, KM plot for progression free survival. Interactive figures also aid in the validation process. The paper concludes with some pros and cons of the Ggplot2 and Plotly packages and throw some light on how the programmers can leverage their knowledge of Ggplot2 to create, customize interactive plots using Ggplotly function.

DV-024 : Methods of a Fully Automated CONSORT Diagram Macro %CONSORT
Jeffrey Meyers, Regeneron Pharmaceuticals

The CONSORT diagram is commonly used in clinical trials to visually display the patient flow through the different phases of the trial and to describe the reasons patients dropped out of the protocol schedule. They are very tedious to make and update as they typically require using different software, such as Microsoft Visio or Microsoft PowerPoint, to manually create, align, and enter the values for each textbox. There have been several previous papers that explained methods of creating a CONSORT diagram through SAS1, but these methods still required a great deal of manual adjustments to align all of the components. The %CONSORT macro removes these manual adjustments and creates a fully automated yet flexible CONSORT diagram completely from data. This presentation is a description of the methods used to create this macro.

DV-111 : Enhanced Spider Plot in Oncology
Ilya Krivelevich, Eisai Inc.
Cixin He, Eisai Inc.
Binbin Zhang-Wiener, Eisai Inc.
Wenyin Lin, Eisai Inc

Graphs are an integral part of modern data analysis of clinical trial results. Viewing the data as a graph together with the results of statistical analysis can greatly improve the understanding of the collected data and the results. Graphical data display provides insight into trends and correlations that are simply not possible with tabular data; very often the visual representation can be the most informative way to understand results. The spider plot of change of tumor size from baseline is one of the more common graphs in oncological studies. Unlike the waterfall graph, which displays the total maximum change from baseline for each subject, the spider plot allows us to visualize change from baseline for the subject over the time. Per our experience, we realized that spider plots could also display other information, such as time-point responses, study drug dosage, and even some subject level information, for example, the value of Best Overall Response. This additional information can be very helpful for reviewers. Some conclusions in this paper are based on RECIST 1.1 evaluation criteria and can be easily adjusted to any other tumor evaluation criteria.

DV-134 : Life Table Analysis for Time to First Event Onset
Abhinav Srivastva, Exelixis Inc

Life Table Analysis is a useful way to represent proportion of subjects meeting an event of interest over a time period. It provides a good indicator of drug safety or toxicity over the course of a clinical trial due to the occurrence of a related event. For example, life table can use used to study the relation between a drug which is highly immunogenic in nature and the type of events it can trigger, such as increased liver events indicating signs of liver disease. In this paper we take a graphical approach to represent this information which is then enhanced to add exploratory and interactive features for the reviewer. Data preprocessing is done in SAS®, while all the plots are created in Python using open-source libraries such as matplotlib, seaborn, plotly and dash.

Leadership Skills

LS-004 : Are you a great team player?
Daryna Yaremchuk, Intego Group LLC

It is well known that being a team player is essential for managing projects successfully and great collaboration within a team brings fruitful results. No matter the role and position, everyone is important in achieving common goals, meeting timelines, providing the exemplary service and reaching the best customer satisfaction. I am ready to bet that at least once during their career every person, whether it is a junior programmer or team lead, has doubts regarding their productive contribution to the project. After that it is normal to reflect on your teamwork performance. Moreover, it is so pleasant to receive an email saying: "It was a great pleasure to work with you and I hope there will be a new opportunity for our collaboration next time”. Also, excellent team player skills are an essential part of the job description for all positions within the statistical programmers society. Surely, it is not something that could be easily measured in contrast to programming skills, nevertheless it is the trait that is highly valued. In this paper the author - discusses what it means to be a good team player depending on your role and position, - describes common characteristics of good team players, - suggests tips on how to improve your must have teamwork skills and - shares her own experience in becoming a good team player from "let sleeping dogs lie” to "a true role model”.

LS-028 : Developing and running an in-house SAS Users Group
Stephen Sloan, Accenture

Starting an in-house SAS ® Users Group can pose a daunting challenge in a large worldwide organization. However, once formed, the SAS Users Group can also provide great value to the enterprise. SAS users (and those interested in becoming SAS users) are often scattered and unaware of the reservoirs of talent and innovation within their own organization. Sometimes they are Subject Matter Experts (SMEs); other times they are new to SAS but provide the only available expertise for a specific project in a specific location. In addition, there is a steady stream of new products and upgrades coming from SAS Institute and the users may be unaware of them or not have the time to explore and implement them, even when the products and upgrades have been thoroughly vetted and are already in use in other parts of the organization. There are often local artifacts like macros and dashboards that have been developed in corners of the enterprise that could be very useful to others so that they don't have to "reinvent the wheel”.

LS-038 : The Interview Process: An Autistic Perspective
Laura Needleman, AstraZeneca

Neurodiversity is an emerging topic within our industry. It is a newer diversity category that we as an industry are just beginning to understand and incorporate into DE&I plans. Many companies understand there are benefits to seeking out and hiring this talent. Topics include discussing various interview techniques and dissecting them from an autistic perspective. This presentation will also offer interviewing ideas that would better help to showcase neurodivergent talent during the interview process. I'll also be sharing my recommendations around providing interview questions in advance as well as formal skill testing as it relates to neurodivergent candidates.

LS-056 : Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
Josh Horstman, Nested Loop Consulting
Richann Watson, DataRich Consulting

While many statisticians and programmers are content in a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. In this paper, two seasoned consultants share their experiences going independent. Topics include the advantages and disadvantages of independent consulting, getting started, finding work, operating your business, and what it takes to succeed. Whether you're thinking of declaring your own independence or just interested in hearing stories from the trenches, you're sure to gain a new perspective on this exciting adventure.

LS-095 : Lessons Learned from a Retired SAS® Programmer
Carey Smoak, Retired

I had a successful 38-year career as an epidemiologist and as a statistical SAS programmer. I retired in August of 2021 and have had time to reflect on my career. I have seen a lot of innovation in my 38 years. I used to write reports using DATA _NULL_. The advent of ODS and PROC REPORT has made report writing much simpler. Today's servers are smaller, but more powerful than the mainframe computers that I used early in my career. But I have also learned a lot of lessons along the way, and I would like to share the lessons that I have learned. I'll also share some tips on preparing for retirement.

Metadata Management

MM-118 : Masters of the Table Universe: Creating Table Shells Consistently and Efficiently Across All Studies
Michael Hagendoorn, Seagen Inc.
Ran Li, Seagen Inc.
Mimi Vigil, Seagen Inc.
Shan Yang, Seagen Inc.

Creating specifications for tables, listings, and figures (TLFs) is traditionally not for the faint of heart. Over the course of weeks or more, our brave author valiantly translates the statistical analysis plan into sometimes hundreds of pages of detailed Microsoft Word-formatted specifications for their study. Meanwhile, two offices over, another author is going through the same painful process of creating a separate Word shell for their own study… and so on. While capturing TLF shells for each study in this manner is the norm, we developed an alternate approach by establishing a single set of shells at the compound level that provides the metadata for all CSR and integration TLFs across the entire product. Such a master shell setup yields many advantages: • Faster shell development for new studies and less maintenance for existing ones • Instant visibility into where studies differ on any detail • Higher consistency across studies, which elevates quality in TLF specs and programming • Increased programming efficiency through expanded macro coverage and less custom code • Enhanced departmental standards adoption to benefit compliance and review • Annotation to ADaM unlocks pathways for future submission documentation and metadata repository • Potential for powerful metadata-driven output generation and other innovations We will share the design, management, and governance model of our master shell implementation. We'll also discuss programming and biostatistics perspectives on the benefits and challenges we observed, along with our solutions – so you can immediately leverage this setup and unleash the power of compound-level specifications!

Quick Programming Tips

QT-030 : Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan, Accenture

We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: • Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don't want to re-run the parts that ran successfully. • We don't want to recreate data sets that were already created. This can take a considerable amount of time and resources and can also occupy additional space while the data sets are being created. • We only need some of the results from the program currently, but we want to preserve the entire program. • We want to test new scenarios that only require subsets of the program.

QT-046 : Repetitive Analyses in SAS® – Use of Macros Versus Data Inflation and BY Group Processing
Brad Danner, IQVIA
Indrani Sarkar, IQVIA

While preparing clinical reports, we are commonly tasked to produce multiple outputs of the same analysis, using a different endpoint of interest, or slightly different populations of interest, or according to a suite of categorical subgroups. Naturally, we can accomplish such repetitive tasks efficiently using SAS with MACRO processing. Alternatively, "data inflation”, an approach that does not employ MACRO processing, with careful use of OUTPUT statements in the SAS data step, we ‘inflate' the source data, so that all variations of the multiple analyses are in one dataset, which can then pass-through analysis procedures once with BY group processing. The objective of this article is to demonstrate these two approaches, either of which can be used for the purpose of analysis and review. Outputs from both approaches can be consolidated and exported into one source which will make the review process less time-consuming. Time-to-event analyses (Kaplan-Meier and Cox regression) will be used to demonstrate both techniques and will be discussed and compared.

QT-123 : Introducing a QC Checklist for High Quality TLFs
Lauren Rackley, Syneos Health

Angry regulatory reviewers and angry clients are things you want to avoid in this industry. By following a QC checklist, error volume should decrease and you should have happier clients. Common errors in TLFs include wrong denominator being used for percent calculations, missing rows, missing columns, incorrect sort order of columns or rows, incorrect footnotes, wrong column headers, or pagination issues. Production programmer should not be sending TLFs to QC until they have checked the output on their own. The programmer should not just make the output and think they're done. They should review the RTF and see if they have followed all the shell formatting and followed any specific programming notes. A QC checklist would enable programmers to know what to look for and have less back and forth in the QC process, especially if double programming was used. An efficient QC process is important when there are tight deadlines and deadlines are critical in this industry.

Real World Evidence and Big Data

RW-050 : Real World Evidence in Distributed Data Networks: Lessons from a Post-Marketing Safety Study
Matthew Slaughter, Kaiser Permanente Center for Health Research
John Dickerson, Kaiser Permanente Center for Health Research
Jennifer Kuntz, Kaiser Permanente Center for Health Research

As a case study to illustrate both opportunities and challenges in distributed data networks, this paper will focus on the implementation of a post-marketing safety study in the Healthcare Systems Research Network (HCSRN) via the associated Virtual Data Warehouse (VDW) common data model. In response to an FDA post-marketing requirement, this study establishes the incidence of angioedema in chronic heart failure patients treated with Sacubitril/Valsartan and incorporates data from multiple distributed data networks, including HCSRN. Distributed data networks present exciting opportunities for gathering real-world evidence by pooling standardized datasets across institutions. Common data models facilitate the efficient allocation of programming work to conduct analysis while allowing participating sites to retain control of their own data. However, large-scale and high-quality data collection combining data from disparate health data systems presents technical, administrative, and scientific challenges. In addition to describing programming, data management, and validation techniques used by HCSRN analysts in this study, we will compare design choices made by the various data networks involved in the project, and explore their practical consequences.

Solution Development

SD-069 : Application of Tipping Point Analysis in Clinical Trials using the Multiple Imputation Procedure in SAS
Yunxia Sui, AbbVie
Xianwei Bu, AbbVie

In phase 3 clinical studies, tipping point analysis has been increasingly requested by regulatory agencies as a sensitivity analysis under missing not at random (MNAR) assumption to assess the robustness of the primary analysis results. One way to implement the tipping point analysis is using the SAS procedure PROC MI, which includes two steps: step one is to impute missing data using multiple imputation (MI) under missing at random (MAR) assumption, and step two uses the MNAR statement to adjust the MI imputed values by a pre-specified set of shift parameters for each treatment group independently. The tipping points are outcomes where the significance of treatment effect is just reversed. In practice, the actual shifts to the MI imputed values are not always exactly the same as the shift parameters specified in the MNAR statement. We summarize our experience with this issue and potential pitfalls in implementing the tipping point analysis using PROC MI and propose alternative options such that the expected shift can be achieved. We propose tipping point analysis method using multiple imputation approach for both continuous and binary endpoints.

SD-070 : Do it the smart way, renumber with PowerShell scripts!
Menaga Guruswamy Ponnupandy, ICON

When you receive a client request to renumber a large number of programs, it can be a time-consuming task to go to each program, rename the file and replace the old file name or number references with the new one manually. SAS programmers came up with the idea of using macros to simplify this task, but the downside of this approach is that it can be difficult to edit and debug the macro for your specific needs. Additionally, these macros may not work dynamically to change all occurrences of a pattern or if the client requests a different numbering scheme in the future. One way to overcome these limitations is to use PowerShell scripts instead of SAS macros. PowerShell is a powerful scripting language that is easy to learn and provides much more flexibility than SAS macros. Plus, PowerShell scripts are dynamic and can be easily modified to accommodate different numbering schemes. If you're not familiar with PowerShell, there are many resources available online that can help you get started. Using PowerShell to renumber your programs can be a quick and easy way to save time and ensure that your programs are correctly numbered according to your client's specifications.

SD-084 : A Macro Utility for CDISC Datasets Cross Checking
Chao Su, Merck
Jaime Yan, Merck
Changhong Shi, Merck

High-quality data in clinical trials is essential for compliance with Good Clinical Practice (GCP) and regulatory requirements. However, data issues exist in ADaM and SDTM datasets within and between them in practical studies. In order to identify and clean data issues before database lock (DBL) or other main milestones, a macro is developed for discrepancies cross-checking between ADaM and SDTM datasets during analysis and reporting processes. In this paper, some common data checks among ADaM and SDTM datasets are presented and discussed. The findings are reported in an Excel spreadsheet with a friendly interface consisting of a neat summary tab and individual formatted tab for each data issue category. Moreover, the modularized structure provides excellent scalability and flexibility for the user to add a user-defined rule with simple and easy steps. This feature allows the macro to be used far beyond CDISC datasets. User-defined rules can be extended to various data structures and types across therapeutical areas and studies. This utility provides a friendly and flexible way to check and track data issues related to the A&R process accurately and efficiently.

SD-098 : Challenges of Developing Microbiology Dataset
Jenny Zhang, Merck & Co., Inc
Shunbing Zhao, Merck & Co.

Challenges of Developing Microbiology Dataset Jenny Zhang, Merck & Co., Inc., Rahway, NJ, USA. Shunbing Zhao, Merck & Co., Inc., Rahway, NJ, USA. ABSTRACT Antimicrobial resistance (AMR) is increasingly being recognized as a global threat to public health, with microbiology data providing important information used to guide clinical development of a new investigational drug. We had an opportunity to develop a microbiology dataset for an infectious disease clinical trial, facing many challenges in developing this complicated dataset. This paper summarizes these challenges and how we overcame them. (1) Translating the completed dataset specification to a streamlined algorithm and SAS codes for genotypic and phenotypic data, providing positioning variables for hundreds of amino acid sequence data for PR (protease) and RT (reverse transcriptase) (2) Difficulties in merging HIV lab data with Geno and Pheno data due to visit window overlapping or mismatching (3) Challenges dealing with the slash character embedded in the mutation test results (4) Validation challenges given the many different sessions and large number of variables

SD-103 : A SAS Macro to Perform Consistency Check in CSR Footnote References
Jeff Xia, Merck
Chandana Sudini, Merck

Footnote is an important part of tables, figures, and listings (TFLs) for CSR, which might include but not limited to abundant information such as abbreviations, acronyms, additional explanations that have values to be added as a part of the TLFs. Footnote is normally provided in a sequential order such as in the format of Roman sequence, or alphabetically, etc. It is often to see inconsistency of footnote reference between the body of TLFs and the footnote section, i.e., a reference number appear in the TFL body, but there is no corresponding reference number in the footnote section, or vice versa. Catching these inconsistencies by eye-browsing is an attention demanding and error prone task to perform. This paper introduces a SAS macro that compares the list of footnote reference number between the body of TFLs and the ones in the footnote section, and flag any discrepancies of footnote reference for each TFL. In addition, the macro generates a report to list the name of each TFL and details of the discrepancies. It is suggested to run this macro after TLFs produced as part of the dry run package, and before delivering the final TFLs to Clinical for CSR.

SD-109 : Importance of Creating a Learning Portal for Statistical Programming End-to-End Processes
Yogesh Pande, Merck Inc.
Donna Hyatt, Merck & Co., Inc.
Brandy Cahill, Merck & Co., Inc.

In a highly regulated pharmaceutical industry, there are multiple Standard Operating Procedures (SOPs) written with respect to the mandatory processes that a clinical/statistical programmer need to follow/practice in their day-to-day programming activities. Multiple SOPs/processes can cause confusion and lead to non-compliance with the processes followed within the Statistical Programming Department (SPD). To avoid this situation, the statistical programming leadership team produced an idea of creating a site/portal that contains important topics listed in one place, as hyperlinks with each topic explained from start to end. For this very reason, the statistical programming end-to-end (SP E2E) learning portal gained a lot of popularity within SPD among junior, new hires, and even the senior programmers. The goal of understanding which process should be followed and when it should be followed was achieved. This idea of developing a learning portal also ensured that details for each process/topic have reached the right audience and that the expectations are understood by every programmer within SPD. The paper is written explaining details on the specific format used for each topic, and the review process followed for each topic, before publishing the topic in SP E2E portal.

SD-122 : Building an Internal R Package for Statistical Analysis and Reporting in Clinical Trials: A SAS User's Perspective
Huei-Ling Chen, Merck & Co.
Heng Zhou, Merck & Co.
Nan Xiao, Merck & Co., Inc.

The presence of R programming language has been on the rise in the analysis and reporting sector of the pharmaceutical industry. Just like SAS programmers regularly write SAS macros, it is common for R users to write R functions for repeating tasks. A robust and reusable R function facilitates the programming work. An R package is like a well-built SAS macro library, including a collection of functions, the documentation that teaches people how to use the functions, sample data, and the testing code with validation evidence. An R package formalizes access to the R functions. Yet, creating an R package from scratch has a learning curve with certain challenges, especially for the first-timers. This paper outlines the essential components of an R package and the valuable tools to help create these components. Relevant online reference materials are provided as well.

Statistics and Analytics

SA-068 : Validating novel maraca plots – R and SAS love story
Martin Karpefors, AstraZeneca
Srivathsa Ravikiran, AstraZeneca
Samvel Gasparyan, AstraZeneca

Hierarchical composite endpoints (HCE) are complex endpoints combining outcomes of different types and different clinical importance into an ordinal outcome that prioritizes the clinically most important (e.g., most severe) event of a patient. HCE can be analyzed with the win odds, an adaptation of the win ratio to include ties. One of the difficulties in interpreting HCE is the lack of proper tools for visualizing the treatment effect captured by HCE, given the complex nature of the endpoint. The recently introduced maraca plot solves this issue by providing a comprehensive visualization that clearly shows the treatment effects on the HCE and its components. The maraca package in R provides an easy-to-use implementation of maraca plots, building on powerful features provided by the ggplot2 package. The maraca package also provides the calculations for the complex statistical analyses involved in deriving the maraca plot, including the overall treatment effect characterized by win odds. An important gap in the package is the question of how to validate the analyses involved in deriving the maraca plots. In this paper we will demonstrate an approach using SAS to validate the outputs generated by the R maraca package and thereby combining the best of two worlds: the flexible plotting capabilities of R and the powerful data manipulation and statistical analysis tools of SAS.

Strategic Implementation & Innovation

ST-005 : What's the story in your subgroup analysis
Lucy Dai, Abbvie

As indicated in one paper in NEJM guidelines (Nov, 2007) that Investigators frequently use analyses of subgroups of study participants to extract as much information as possible. Such analyses, ....,may provide useful information for the care of patients and for future research. However, subgroup analyses also introduce analytic challenges and can lead to overstated and misleading results. The purpose of this paper is to present some of the challenge in understanding and interpreting subgroup analysis results through three examples from real clinical trials.

ST-006 : Digital Data Flow (DDF) and Technological Solution Providers
Piyush Singh, TCS

Digital Data Flow (DDF) is an initiative to organize and automate the processing of clinical data and study protocol. From a technology perspective, one of the key purposes of this initiative is to deliver the technical standards which can be utilized to mechanize the study execution process, create a flexible solution and minimize manual effort during the study life cycle. One of the most important principles of the DDF initiative is being vendor agnostic, which means that different organizations can implement their solution in their own way, using reference architecture (RA) from DDF, from both process and technology perspectives. This paper explains how the technology providers/ technological product vendors can utilize the DDF deliverables to help pharmaceutical companies with new solutions/platforms to innovate and automate their manual and traditional study execution process which ultimately can help to reduce overall cost, duration of the study and operational effort, and increase the return. This paper also explains how Pharma companies can utilize the strength of technology to take maximum advantage of DDF.

ST-010 : Key required infrastructure for digital transformation in pharmaceutical clinical data operations
Srinivasa Rao Mandava, Merck


ST-102 : Key Statistical Programming Considerations in External Collaborative Clinical Trials
James Zhao, Merck & Co., Inc
Hong Qi, Merck & Co., Inc.
Mary Varughese, Merck & Co., Inc.

Collaboration between pharmaceutical industry (sponsor) and external partners is becoming increasingly popular in drug development as it can be mutually beneficial. Throughout this collaboration, study activities from start up to Interim Analysis (IA) are often performed by the partner who conducts the study, and the study data is transferred to the sponsor to support regulatory submission which requires CDISC compliant SDTM and ADaM datasets. This has brought many challenges due to the inconsistency in data collection among partners, the quality or format of data used for Analysis and Reporting (A&R), and the timing to access the data by the sponsor for evaluation and transformation according to the regulatory requirements. This paper discusses some key programming considerations during this process to improve the efficiency of data issue resolution, data transformation, statistical report generation and submission package preparation.

ST-106 : Automation of Dataset Programming Based on Dataset Specification
Liqiang He, Atara Biotherapeutics

In the clinical trial field, standard datasets, such as SDTM domains and ADaM datasets, is an integral part of electronic submission package, and also the pre-requisite for TFL generation. Dataset programming is a time consuming, tedious task for SAS programmer. A highly efficient, automatic programming for dataset generation will prevent manual programming typos, save programming time and resources, and deliver high-quality work. Dataset specification is a detailed instruction for dataset programming and the major reference for dataset validation. This paper demonstrates a new practical approach to automate dataset programming based on dataset specification. It is pivotal for successful auto-programming to write variable derivation in a standardized syntax with the aid of keywords and punctuation marks in the dataset specification, which can be read and translated into SAS code by SAS.

Submission Standards

SS-003 : Handling CRS/NT Data in CAR-T Studies and Submission
Joe Xi, Bristol Myers Squibb
Yuanyuan Liu, Bristol Myers Squibb

Chimeric antigen receptor T (CAR-T) cell therapy has been a popular and hot therapy in recent years, and many pharmaceutical / biotech companies are developing their pipelines using this new technology. In most CAR-T studies, cytokine release syndrome (CRS) and Neurotoxicity (NT) are the most common types of toxicity caused by CAR-T cells. As a result, the management of CRS and NT become very essential. During regulatory submissions, agencies have required special or extra data and analysis beyond regular AE data reporting. To facilitate the analysis, we have produced supplemental specialized CRS and NT datasets responding to FDA requests. As a team from Juno/Celgene/BMS, we have worked on the filing of both Breyanzi and Abecma (2 out 6 available CAR-T products) consecutively and accumulated some experience on how to handle the data for CRS/NT. In this paper, we hope to share our experience on the following two topics: 1. How we organized CRS/NT data in SDTM/ADaM package to support CSR. 2. How we supported health authority (HA) review by providing supplemental CRS and NT data which were requested by FDA review team. We hope this paper can provide some insights to other teams who are working on CAR-T studies, or preparing CAR-T product submission packages, from the perspective of handling CRS/NT data.

SS-045 : Optimizing Efficiency and Improving Data Quality through Meaningful Custom Fix Tips and Explanations
Jennifer Manzi, Pinnacle 21
Julie Ann Hood, Pinnacle 21

Prior to submitting study data to any regulatory agency, the standardized data is evaluated using at least one validation tool to ensure data conformance and quality. The issues in the resulting report are then triaged, first to determine the cause of the issue, then to decide on the best course of action to resolve or address each one. Knowing which issues should be researched and which can only be addressed through explanations in the Reviewer's Guide is crucial to efficiently utilize a validation report. Guidance provided by an organization on how to approach validation issues can help save time and ensure the issue is addressed correctly, resulting in higher quality and consistent data across studies. This paper will focus on steps to create meaningful Fix Tips that will enable users to quickly evaluate researching and resolving an issue. It will also include examples of when an explanation is the best option and the details needed to create comprehensive standardized explanations.

SS-112 : Standardization of Reactogenicity Data into Findings
Charumathy Sreeraman, Ephicacy Lifescience Analytics

Reactogenicity event(s) is the key safety assessment of the vaccines TA. It refers to a particular expected or generic reaction following vaccine administration. The term reaction usually implies that the adverse event has a causal relationship with the vaccination, or at least there exists a distinct possibility. Reactogenicity is evaluated by observing a pre-specified set of adverse events over a pre-defined observation period. Standardizing the reactogenicity data in the SDTM datasets implements the ‘FLAT MODEL' strategy prescribed by the Vaccines TAUG. This paper will be discussing on standardizing the diary data from study subjects into finding domains as applicable rolling it into ‘Flat Model' strategy.

SS-127 : Proposal for New ADaM Paired Variables: PARQUAL/PARTYPE
Elizabeth Dennis, EMB Statistical Solutions, LLC
Monika Kawohl, mainanalytics
Paul Slagle, IQVIA

For more than a decade, producers have struggled to create unique PARAM values to fully describe each analysis parameter. Even when it is less efficient to have fully unique PARAMs, it has been the requirement. With ADaM IG v3.0 this is expected to change. PARQUAL (and the paired variable PARTYPE) are expected additions that will allow PARAM to identify multiple analysis parameters. These are special purpose variables that are intended to be an exception, not a common occurrence. In most cases they will be unnecessary. However, when the meaning of PARAM essentially remains unchanged except for a single qualifier (such as ‘Investigator' ‘Central Reader'), PARQUAL can be a useful tool to simplify PARAM. This paper will summarize the current requirements of PARAM and PARCATy. It will review the history of past proposals for PARQUAL and its existence in TAUGs and other documents. The new requirements for PARQUAL and PARTYPE will be introduced, along with examples of correct usage. Examples of not allowed use cases will also be discussed. Finally, the status of the associated controlled terminology will be presented.


EP-018 : Visualization for Success: Driving KPIs and Organizational Process Improvements via Portfolio-level Analytics
Philip Johnston, Pinnacle 21
Julie Ann Hood, Pinnacle 21

Cloud-based data diagnostic platforms enable organizations to build institutional memory and drive process improvements. Platforms that passively aggregate metrics spare teams from having to "wrangle KPIs" and instead visualize the macro-level trends in their data quality, conformance, standards adoption, submission risks, and team activity at a glance. This poster highlights how these data are showcased in P21 Enterprise's built-in Analytics module and suggests actionable steps based on these trends to support inter- and intra-departmental process improvements. It also demonstrates how the various portfolio-level reports, filters, and views now available within the application support organizations in their coordination efforts and the development of best practices. Impactful use cases include: benchmarking data quality across therapeutic areas and over time, eliminating Reject Issues, monitoring the uptake of new standards, prioritizing Issues for which to create standardized explanations, developing guidance for frequently occurring Validation Rules, visualizing efforts to balance workloads, and encouraging documentation through "gamification."

EP-089 : Creating a Centralized Controlled Terminology Mapping Repository
Danny Hsu, Seagen
Shreya Chakraborty, Seagen

Controlled Terminology (CT) is the set of code lists and valid values used with data items within CDISC-defined datasets for regulatory submission. The use of CT helps harmonize the data for all submitted studies and improves the efficiency of data review. It also opens the door to powerful internal automation for heightened quality and efficiency. The question how to most effectively map the collected data, especially any free-text values from CRFs, into CT plays a very important role in each study and product team's development of CDISC datasets that are not only consistent and compliant, but also support analyses and summaries efficiently. This article shares a concept of creating a centralized repository (such as a spreadsheet) to help streamline the process of CT mapping. A centralized CT mapping sheet is generated by collecting all mapped terms from previous studies and is used to identify the new terms which need attention. Following this process, study team could save time by focusing only on the newly added terms, therefore improving the efficiency of the data-mapping process and increase efficiency and quality of CDISC dataset generation and review. A centralized CT mapping sheet could not only support the creation of code and value list of define.xml, but also provide the format mapping used in SAS programs.