Abstracts

PharmaSUG China 2015

List of abstracts subject to change. Last updated 25-Aug-2015.

Data Analysis
Data Visualization and Graphics
Management & Career Development
Preparation & Regulatory Standards including CDISC
Programming Techniques
Statistics including Pharmacokinetics

Data Analysis

Principles of intervention or event date imputation
Zhao Chunpeng

Principles of intervention or event date/time imputation § H0: Study drug is not safety enough " Imputed start date/time: as close to reference start date/time as possible " Duration of intervention or event: as long as possible § More complete date/time is more reliable End date/time imputation § Principle: Duration of intervention or event: as long as possible " Incomplete end date/time is before reference start date/time § As close to reference start date/time as possible " Incomplete end date/time is after reference start date or uncertain § As far from reference start date/time as possible § Not exceeds reference end date/time Start date/time imputation § Principle1: Imputed start date/time: as close to reference start date/time as possible § Principle2: Duration of intervention or event: as long as possible § Incomplete start date/time is before reference start date/time " Tradeoff between principle 1 and principle 2 - Mid date/time in possible period(Expected value) " Mid of day: 12:00 " Mid of month: 15th " Mid of year: July 01st § Incomplete start date/time is after reference start date/time or uncertain " Imputed start date/time: as close to reference start date/time as possible - As close to reference start date/time as possible § Principle: More complete date/time is more reliable " If imputed start date/time is after or on more complete end date/time then - The same with End date/time

SAS Bayesian Procedure Applications in Clinical Trials
Aijun Gao, inVentiv Health Clinical

Currently multiple software and languages are available for Bayesian analyses which make the Bayesian computation much easier and more practical. SAS is one of the user-friendly software can be used for Bayesian analyses and has been applied in clinical trial data analysis and simulations. Both binary and continuous clinical study data analysis examples will be presented. Detailed SAS code will be provided and discussed. In addition, more general types of applications of SAS Bayesian procedures in clinical trial projects will be shared from real experiences.

Adverse Event Data Programming for Infant Nutrition Trials
Ganesh Lekurwale, Singapore Clinical Research Institute Pte Ltd
Parag Wani, Singapore Clinical Research Institute Pte Ltd

Many times in infant nutrition trials, the effect of the study product (e.g. supplementary formula feeding) is considered to have its effect on the safety profile only on the day it was consumed. Thus, it is important to present the safety data by the study product consumption pattern such as fully on study product, study product as well as breast feeding or other complementary feeding, or no study product. This unique way of presenting the safety data adds complexity in the programming as subjects frequently switch from one consumption pattern to other during the trial period. However, such presentation of safety data is very helpful for the interpretation of the study product safety profile under real life scenarios. This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

Cutpoint Determination Methods in Survival Analysis using SAS®: Updated %FINDCUT macro
Jayawant Mandrekar, Mayo Clinic
Jeffrey Meyers, Mayo Clinic

Statistical analysis that uses data from clinical or epidemiological studies, include continuous variables such as patient's age, blood pressure, and various biomarkers. Over the years there has been increase in studies that focus on assessing associations between biomarkers and disease of interest. Many of the biomarkers are measured as continuous variables. Investigators seek to identify the possible cutpoint to classify patients as high risk versus low risk based on the value of the biomarker. Several data-oriented techniques such as median and upper quartile, and outcome-oriented techniques based on score, Wald and likelihood ratio tests are commonly used in the literature. Contal and O'Quigley (1999) presented a technique that used log rank test statistic in order to estimate the cutpoint. Their method was computationally intensive and hence was overlooked due to the unavailability of built in options in standard statistical software. In 2003, we had provided the %FINDCUT macro that used Contal and O'Quigley's approach to identify a cutpoint when the outcome of interest was measured as time to event. Over the past decade demand for this macro has continued to grow that has led us to consider updating the %FINDCUT macro to incorporate new tools and procedures from SAS such as array processing, Graph Template Language, and the REPORT procedure. New and updated features will include: results presented in a much cleaner report format, user specified cut points, macro parameter error checking, temporary data set clean-up, preserving current option settings, and increased processing speed. We intend to present the utility and added options of the revised %FINDCUT macro using a real life dataset. In addition, we will critically compare this method with some of the existing methods and discuss the use and misuse of categorizing a continuous covariate.

Detecting Suspected Fraudulent Data in Clinical Trials
Xiaodong Shi, Boehringer-Ingelheim, ShangHai, China

In medical research, fraud in clinical trials constitutes a serious breach of ethics relating to compliance to the trial protocol. In addition to potentially putting the health and well-being of patients at stake, fraud renders the data and conclusions of the entire clinical trial questionable and undermines the trustworthiness of the scientific results. Thus, the sponsor's investment in the clinical trial and the entire clinical program may be at risk, in particular with respect to the approval of a new drug application by regulatory authorities.

Not Just Merge - Complex Derivation Made Easy by Hash Object
Lu Zhang, PPD

Hash object is known as a data look-up technique widely used in data steps for many of its advantages. Before SAS 9.2, hash object was mostly used to accomplish efficient data merging in a DATA step. However, the hash object did not allow storing and retrieving duplicate keys before SAS 9.2. This constraint was eliminated in SAS 9.2 where in this version hash objects in DATA step can perform data look-up even though the keys are not unique. With this improvement, many complex derivations in our daily work become more straightforward and simpler than before. In this paper, the added features to hash object in SAS 9.2 will be discussed. Also examples in analysis database derivations will be given to illustrate how hash object works to improve the implementation efficiency of complicated derivation algorithms.

Data Visualization and Graphics

GTL makes it is possible: A picture is worth a thousand words
Lynda Li, Roche Product Development in Asia Pacific (PDY)

"A picture is worth a thousand words." This statement may sound a little exaggerated, but in the world of clinical trial, meaningful graphics do play a very important role for the statisticians or clinical scientists to make the decision in time. As statistical programmers, our main task is to transfer the complicated data into meaningful graphs to show the insights and interpret the relationship among direct or indirect or even false connections. SAS 9.4 powerful GTL brings good news for programmers and makes it possible for programmers to illuminate their hypotheses by graph easily and intuitively. This paper will demonstrate how SAS 9.4 GTL powerfully combine PK concentration individual plot and mean plot with key summary statistics to meet stakeholders' needs.

Build Child Growth Charts Using SAS GTL
Rajesh Moorakonda, Singapore Clinical Research Institute Pte Ltd

Creating child growth charts in SAS with different requirements such as plotting the different lines for P5, Median and P95 for each treatment group by distinguishing the treatment period with multi-pattern lines, unequal axis intervals and printing axis aligned statistics table with same treatment color codes below graph within graphics area is daunting task in child nutritional trials. This paper will explain the difficulties faced during the development of a growth chart and their step-by-step programming solutions using the SAS 9.3 Graph Template Language along with an example.

A Fully Automated Approach to Concatenate RTF outputs and Create TOC
Zhiping Yan, Covance
Lugang Larry Xie, Merck & Co.

Statistical reports for clinical studies usually contain huge amounts of tables, listings and figures. As ODS RTF has gained its popularity in clinical trial reporting, it is extremely useful to concatenate RTF reports and create a hyperlinked table of contents (TOC) to facilitate review. However, the approaches introduced by the existing papers have issues in one way or another. For instance, manual process must be involved - TOC field is updated by pressing F9; anchor ID must be inserted in the reports by each individual SAS programmer, etc. This paper introduces a relatively simple and very efficient SAS approach to concatenate reports generated by ODS RTF and create a nicely formatted hyperlinked table of contents. It utilizes the bookmark/hyperlink features of RTF, allowing the generation of TOC to involve no manual process and hidden text. At the end, this paper also attempts to discuss on potential issues and future improvement.

Management & Career Development

Managing the analysis programming effort for an NDA submission
Yu Cheng, Eli Lilly and Company
Quan Zhou, Eli Lilly and Company

Preparing for an NDA submission takes a lot of work, and statistical programming plays a key role in making sure sufficient and accurate information is provided to the regulatory agencies. This paper presents the experiences and lessons learned from managing the statistical programming effort to support a recent NDA submission to the FDA. We will present the challenges, and what we did to reduce risk and ensure efficiency and quality. In addition to sharing our experiences, we will also discuss lessons learned and our proposals for future improvement. We hope the topics covered can apply and be beneficial to most other programming projects.

Developing Global Clinical Programming Team with Qualification and Cost Minimization in China
Margaret Li
Lulu Swei, PRA Health Sciences

Today many global pharmaceutical companies and Contract Research Organizations (CROs) are building their global clinical programming teams in Asian pie, especially in China. Due to a high demand for talented clinical programmers in China, building a global clinical programming team is a big challenge. Building the team with quality and cost minimization is an even bigger challenge in all the aspects of recruiting, training, roles&responsibilities, communication within internal remote team and external global PRA team, resources management, quality control, lesson&learn, and so on. This paper will describe the challenges and solutions in building the clinical programming team in Shanghai, Beijing and Wuhan sites at WuXi PRA for the past almost two years.

Getting clouds moving across the Pacific - a case study on working with a Chinese CRO on SAS® Drug Development
Chen Shi, Santen Inc.

California has been in a long drought and some parts across the Pacific have been suffering from storms during the past 2 years. Why not move the clouds over? This can be a dream but not only a dream if we know how to manage the clouds. In this case study, we will share our experiences and lessons learned about delivering study packages on the SAS® Drug Discovery (SDD) platform with a newly on-boarded Chinese CRO. This paper will elaborate on topics including project setup, training, scheduling, and communications as well as debugging, quality control and project summarizations. It was a fun and challenging project as it proved that SDD can be a useful tool for enabling cross-continent teams to work together

How to Build an "Offshore" Team with "Onshore" Quality
Lulu Swei, PRA Health Sciences
Margaret Li

In today's competitive global market, many pharmaceutical companies and CROs are building offshore clinical programming teams for various reasons. Often times, we heard cases about offshore outsourcing deals gone bad. For the past year and half, PRA Health Sciences (PRAHS) successfully build a clinical programming team through a joint venture with WuXi Apptec in China. Currently we have clinical programming teams in Shanghai, Beijing and Wuhan. They truly have become an extension of our global programming team, and enable us to provide around the clock service for our sponsors. This paper will discuss the challenges we faced and solutions throughout the process in the following area: 1. Recruitment and Retention 2. Training and Certification 3. Global and Local Management Support 4. Resourcing Process and Policy 5. Project Governance

Job-Oriented Training Program for Clinical SAS Programmers -- One Year Later
Lixiang Yao

To extend the topic presented last year regarding the training program for the beginners, the author would like to share more experience about it based on practice.

Preparation & Regulatory Standards including CDISC

Clinical Data Transparency and Sharing: Update on Research Benefits, Risks and the Future
Matt Becker, SASA

Whether called data transparency or data sharing, there's a movement to give more researchers greater access to patient-level clinical trial data. The goal is to create an environment for innovation in clinical research. Join this presentation to discuss what is being done, including exploring the value to the overall health care system of creating a multi-sponsor environment that gives researchers access to larger pools of data. This is an update to where we stand with clinical data transparency from PharmaSUG China 2014.

SDTM Electronic Submissions to FDA: Guidelines and Best Practices
Christina Chang, PAREXEL
Kyle Chang, PAREXEL

Electronic data submission is the future of clinical trials. United States Food and Drug Administration (FDA) released several submission guidance documents since last year. The guidance of "Study Data Technical Conformance Guide" provides specifications, recommendations, and general considerations on how to submit standardized study data using FDA-supported data standards. It was developed in an effort to combine the existing Common Issues, Study Data Specifications and Traceability Guidance documents, as well as Validation Rules, in order to offer one technical document that coordinates all these sources for the industry. This will reduce the likelihood of the FDA requesting data to be represented in a manner that contradicts CDISC rules. It also provides technical recommendations to sponsors for the submission of study data and related information in a standardized electronic format. This paper elaborates on the following fundamental and core components to be considered for FDA submissions: Study Data Submission Format, Terminology, Electronic Submission Format, Data Validation and Traceability.

Approaches to Missing Data in the Analysis of SpondyloArthritis International Society (ASAS 20) Response and the Creation of the Related CDISC Compliant Analysis Datasets
Christine Joy Dureza, PPD

Randomized controlled trials (RCTs) is one of the preferred study designs for evaluating interventions. Unlike observational studies, the randomization of patients to interventions means that a direct causal association can be made between an intervention and its effect. In order to measure the treatment effect, there is a need to at least measure the patient's response at the end of the trial. In most instances, a series of responses will be measured at baseline and throughout follow-up. However, it is not always possible to collect all the intended data on each individual and in all given timepoints. Thus, analysis of the data that were collected without any further reflection generally leads to misleading conclusions. When the data is incomplete, conclusions between the intervention and response is compromised. In this paper, several approaches to data "missingness' is discussed in the analysis of ASAS 20 response. This response is used as an outcome measure, a composite measure, in the investigation of ankylosing sponydylitis (AS). This is a chronic disease characterized by ankylosis (stiffening and immobility) of the spine, and inflammation at the insertions of tendons. To be fulfilled, ASAS 20 response criteria requires e 20% improvement or at least e 1 unit reduction in a 0 to 10 unit scale in at least 3 of 4 domains, with no worsening in the fourth domain of e 20% or e 1 unit in a 0 to 10 unit scale. The 4 domains refer to patient global assessment, pain, physical function, and inflammation (morning stiffness), or duration of morning stiffness. The missing data approaches such as non-responder imputation (NRI), last observation carried forward (LOCF), and baseline observation carried forward (BOCF) are presented in this paper and considerations and recommentations on how to prepare a CDISC compliant analysis data sets that includes the abovementioned approaches are provided.

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit
Vivian Feng, SAS R&D Beijing

How do you ensure good quality of your clinical data? Currently there are many tools that can handle simple validation, but if we meet cross-standard validation or other complicated validation, the work becomes very complex. On the other hand, although you can develop validation tools by yourself, it's hard to maintain, you have to do a lot of work to modify existing codes when you need to customize your own validation. This paper presents SAS Clinical Standards Toolkit (CST), a powerful and flexible toolkit which can help you validating your clinical data, as well as supporting other work in clinical research activities. CST validation assesses the compliance of data, and the metadata describing the data. It assesses the consistency of values in a specific column, between columns, across records in a specific data set, and across data sets. After the validation process, it shows accurate and clear results to you. In addition, it can customize the solutions easily to meet your needs. CST is a separately orderable component that is available at no additional charge to currently licensed SAS customers. It's free as long as you have SAS Foundation. The latest version 1.7 is available on SAS 9.4.

Multilingual data support in Dataset-XML with SAS® Clinical Data Integration
Jing Gao, SAS Research & Development (Beijing)

Dataset-XML is a CDISC XML format for exchanging clinical study data between any two entities. That is, in addition to support the transport of datasets as part of a submission to the FDA, it may also be used to facilitate other data interchange use cases. For example, the Dataset-XML data format can be used by a CRO to transmit SDTM or AdaM datasets to a sponsor organization. Dataset-XML can represent any tabular dataset including SDTM, ADaM, SEND, or non-standard legacy datasets. With the growing trends in the globalization of Drug Development, there are increasing clinical trials conducted in various countries. So clinical trial data that comes from various countries using different languages may need to be processed. In other hand, CDISC standards are becoming more accepted outside the USA, especially, SDTM is used in many countries that use other character encodings (e.g. Shift-JIS in Japan) for submissions to local regulatory authorities. In this context, one of the advantages of the Dataset-XML format is highlighted: Dataset-XML supports all language encodings supported by XML. This requires that the related industry solutions not only support US-ASCII characters, but also support non-ASCII characters in Dataset-XML. This presentation will introduce: 1) how to create Dataset-XML files with multiple encodings (UTF-8, ISO-8859-1, Shift-JIS, etc.) from SAS datasets using SAS Clinical Data Integration (CDI); 2) how to choose the appropriate encoding for the particular languages in Dataset-XML; 3) the SAS Macros called by CDI to create Dataset-XML; 4) Lastly, let's look into the non-ASCII characters whether are supported by the Dataset-XML Tools (OpenCDISC, XPT2DatasetXML, etc.).

Summary level clinical site data for data integrity review and inspection planning in NDA and BLA submission
Jingwei Gao, Boehringer Ingelheim(China) Investment Co., Ltd.
Nancy Bauer, This email address is being protected from spambots. You need JavaScript enabled to view it.

The presentation focus on he highlights of FDA draft guidance on the Summary Level Clinical Site Data for CDER's Inspection Planning and its submission requirments. FDA purpose for the guidance "is intended to facilitate use of a risk-based approach for the timely indentificaitn of clinical investigator sites for on-site inspeciton by CDER during the review of marketing applications". The structure of data set, the data set contents, additional required submitted items, site selection and future develpment are as well discussed in the presentation. By FDA, the guidance is required to apply to NDAs, BLAs and NDA and BLA supplemental applications containing new clinical study reports submitted to CDER.

SAS® Tools for Working with Dataset-XML files
Lex Jansen

Dataset-XML is a new CDISC standard that is used to provide study data sets in an XML format. The purpose of Dataset-XML is to support the interchange of tabular clinical research data using CDISC ODM-based XML technologies. This workshop will introduce the standard and present SAS based tools to transform between SAS data sets and Dataset-XML documents. You will learn to create, validate and read Dataset-XML files.

Tips for efficient CDISC eCRT production
Lanting Li
Yu Zhu, PPD
Yuejuan Meng, PPD
Huan Zhu, PPD

CDISC Case Report Tabulation Data Definition Specification (Define-XML) is one of primary documents required by FDA for electronic submission, which describes the content and structure of the included data within a submission. eCRT production is helpful to increase the level of automation and improve efficiency of the Regulatory Review process. In most cases, the original datasets and specifications may not be completely 'ready' for direct eCRT development. This will cause repetitive modification of the contents, and even have negative impact on the quality. The objective of this paper is to present tips on improving the efficiency in eCRT production for two kinds of models (Study Data Tabulation Model (SDTM), Analysis Database (ADB)) and Integration Summary of Safety and Efficacy (ISS/ISE)). It includes raw data readiness check before creating define, tips in CRF annotation, data specification requirements to be used for eCRT purpose, tips in define creation, supplemental materials preparation and validation methods. In addition, this paper also broaches the standard workflow of development which can be helpful to accomplish eCRT production automaton. Key words: CDISC eCRT, specification, define.xml, OpenCDISC

SAS® End-to-End solutions in Clinical Trial
Emma Liu, SAS Beijing R&D

ABSTRACT Are you have trouble executing timely queries against historical or ongoing clinical trials data? Is there confusion retrieving data from particular trials beyond generating standard reports and analysis across different systems? Are you looking to build a robust integration routine to bring diverse sources of clinical data together repeatedly for different trials? How do you build up fluency collaboration model with external clinical data management partners? If you answered 'YES' to any of these questions, then you need to look at the SAS End-to-End solutions in Clinical Trial. LEGO toys are popular because they are flexible, reusable and combinable. Similarly, in this presentation we will discuss how SAS End-to-End solutions in Clinical Trial; I would also like to represent it as "SAS LEGO toy" in term of playing specific roles in the Clinical Trial with Data Collation, Data Integration, Data Transformation, Data Analysis and Data Exploration. And with a combination of these"SAS LEGO toys" to SAS End-to-End solutions,we will see how SAS optimize and streamline Clinical Trials from the beginning to the end of working out high-quality submissions to regulatory; how to reuse integrated standards over different components of the clinical data flow; how to build up collaboration modules which can be outsourced or split up along different departments according to the business processes; and everything of this can be work out in an security and regulatory compliance SAS development environment to makes you spending less time on operational data activities and more time on analysis of the data.

Automatic generating blankcrf.pdf for Rave Study
Haiqiang Luo, PPD Inc.

blankcrf.pdf is a critical component of eCRT for NDA submission. It is a blank CRF with annotations that documents the location and mapping of CRF data with corresponding data set names and variables names in the tabulation data set. Manual creating blankcrf.pdf is a common practice. However, it is a tedious repetitive task which can lead to inconsistence issue among sibling studies. The standardized structure of CRF produced by Medidata Rave System makes it feasible to automate the mapping process. This paper describes a method of generating blankcrf.pdf automatically for this kind of studies.

Hands-On ADaM ADAE Development
Sandra Minjoe, Accenture

The Analysis Data Model (ADaM) Data Structure for Adverse Event Analysis was released by the Clinical Data Interchange Standards Consortium (CDISC) ADaM team in May, 2012. This document is an appendix to the ADaM Implementation Guide (IG) v1.0, and describes the standard structure of the analysis dataset used for most of our typical adverse event reporting needs. This hands-on training focuses on creating metadata for a typical adverse event (AE) dataset. Attendees will work with sample SDTM and ADaM data, finding information needed to create the results specified in a sample set of table mock-ups. Variable specifications, including coding algorithms, will be written. Some familiarity with SDTM data, AE reporting needs, SAS® data step programming, and Microsoft Excel is expected. Attendees will also learn how to apply the data structure for analyses similar to adverse events, such as concomitant medications. The ADaM Occurrence Data Structure, currently in draft form at the time of this writing but potentially available in final form by the time of the conference, will also be referenced.

STATISTICS AND PROGRAMMING IN THE GLOBALLY EVOLVING LANDSCAPE OF CLINICAL TRIAL REGISTRATION AND RESULTS (CTRR) DISCLOSURE
Paul Ngai, Xogene Services, LLC
Joyce Hauze, Xogene Services, LLC.

Following up on last year's topic, we will discuss recent changes in the globally evolving landscape of Clinical Trial Registration and Results (CTRR) disclosure. The presentation will be informative for statisticians and management with beginner to advanced skill levels. CTRR disclosure is a cross-disciplinary topic relating to preparation and regulatory standards, management and career development, data analysis and visualization, and statistics including pharmacokinetics. A recent survey of the DIA Clinical Trial Disclosure (CTD) community returned results that show CTRR disclosure is an activity that will continue to grow and require innovative programming shifts and changes throughout the next few years. Governing bodies are rolling out new rules and regulations while we try to comply with them, using new and ever-changing database environments. In the United States, the DHHS invited comments on their Notice of Proposed Rulemaking that outlined the changes they intend to include in the Final Rule before the end of 2015, which will affect how results are disclosed on ClinicalTrials.gov. The EMA released Policy 70 that serves as a bridge between where we are today and where we will need to be to comply with the Clinical Trials Regulation in 2016. EudraCT will be changing dramatically because the new portal will not be ready to "go live" until sometime in 2017. The CFDA is keeping up with change. The original Chinese Clinical Trial Register (ChiCTR) was retired in December 2014. The World Health Organization has accepted the new website as a clinical trial register. We will discuss some of their new processes and standards as well. Discussion will cover the most recent global changes, and their actual and anticipated effect on programming and statistical issues for preparing data tables for "anonymized" Clinical Study Reports, and for disclosure in regulatory databases that continue to lack a harmonized approach.

Reading and Resolving OpenCIDSC Message
Yu Pang, Novartis

Submitting CDISC SDTM data to FDA, We need to do a lot of work to ensure the compliance of submission package to make sure the submitting process goes smoothly as possible. OpenCDISC is one of the most popular tools for CDISC compliance checking, the issue OpenCDISC identified can be divided into 3 kinds, caused by OpenCDISC bugs, caused by violation of IG/CT rules, and by dirty data, and. We need look into the report and find out how the message come from and fix them as long as we can. This paper provides a real case of submission in Jan-2015, 4 studies in a package and OpenCDISC v1.5 is used. It contains how we generate the define package, check the violations reports and identify the false-positive message, fix the compliance issue in data via post-processing, keep the consistency between xpt and define.xml, and describe the data issue in SDRG. From the experience, we learned how to deal with some common OpenCDISC message, how to fix them and prevent them in early phase.

Generating Define.xml Using SAS® By Element-by-Element And Domain-by-Domian Mechanism
Lina Qin, Independent Consultant

An element-by-element and domain-by-domain mechanism is introduced for generating define.xml using SAS®. Based on CDISC Define-XML Specification, each element in define.xml can be generated by applying a set of templates instead of writing a great deal of "put" statements. This will make programs more succinct and flexible. The define.xml file can be generated simply by combining, in proper order, all of the relevant elements. As each element related to a certain domain can be separately created, it is possible to generate a single-domain define.xml by combining all relevant elements for that domain. This mechanism greatly facilitates generating and validating define.xml.

Generic Macros for Data Mapping
Qian Zhao, Johnson & Johnson Consumer Companies, Inc.
John Wang, Johnson & Johnson China
Ruofei Hao, Johnson & Johnson Consumer Companies, Inc.

Data mapping and its QC can be a daunting task. When data are mapped, e.g., to SDTM, the variable values, variable names, variable types, and data structures, etc., may change. Writing generic macros to perform data mapping and QC of the mapping that work for any data set may seem impossible. This paper explores a technique to use the combination of external control files and generic macros to perform data mapping and QC. This technique can be extended to other uses. This paper uses two examples to illustrate the concept. In the first example, we will demonstrate how to QC a data set mapped from a horizontal structure to a vertical structure by isolating a single variable at a time and circulating through all variables. In the second example, we will demonstrate how to write a macro against data mapping specification to generate mapping program.

Programming Techniques

An Excel Macro for Quick and Efficient Analysis of SAS® Log Messages
Lingyun Chen
Li Cheng, Vertex Pharmaceuticals

SAS log check programs provide summaries of log issues of a SAS program. They require batch submission of a SAS program after it has already been developed but cannot be used during program development in the SAS interactive environment. To achieve similar functionality while the program is still being developed in the SAS interactive environment, an Excel macro was created using VBA code. This macro provides real time overview of log issues with the click of a button. It also provides drill down capability to quickly locate targeted log messages. The macro has a simple user interface and is customizable to include search strings that are of interest to individual users. It provides quick and efficient analysis of log messages to aid users in debugging SAS programs while they are still in the SAS interactive environment for development.

Handling multiple y axes using SAS® Graphs
Mina Chen
Peter Eberhardt, Co-author

Graphics are a powerful way to display clinical trial data. Graphs are widely generated in the pharmaceutical industry to review the results of clinical trials. For example, mean plot is often used to show laboratory results change over time. Sometimes there's a need to display the changes of two laboratory parameters over time within one mean plot. A second y-axis is useful for plotting data with different scales on the same plot. This paper will show, by example, how such graph can easily be created using the SAS/GRAPH®.

Novel Programming Methods for Change from Baseline Calculations
Mina Chen
Peter Eberhardt, Co-author

In many clinical studies, change from baseline is a common measure of safety and/or efficacy in in the clinical data analysis. There are several ways to calculate changes from baseline in a vertically structured data set such as Retain statement, Arrays, DO-Loop in DATA steps or PROC SQL. However, most of these techniques require operations such as sorting, searching, and comparing. As it turns out, these types of techniques are some of the more computationally intensive and time-consuming. Consequently, an understanding of these techniques and a careful selection of the specific method can often save the user a substantial amount of computing resources. This paper will demonstrates a novel way of calculating change from baseline using Hash objects.

Saving Typing with SAS® Editor Abbreviations
Mina Chen

Accelerating analysis and faster data interpretation helps to stay competitive in the drug development field. It requires statistical programmers to identify fast and efficient ways to deliver analysis results in a timely manner. Imagine you're at your computer working on an urgent task. You need to write code for a SAS procedure but you don't remember the syntax completely. It's time-consuming to search the SAS manuals or the SAS Help website to look up the syntax, trial and error coding, and tedious typing. In such situation SAS® Abbreviations are an easy, quick and efficient way to reduce programming time. With just a few typing, the proper syntax is inserted automatically into your SAS program. Abbreviations can also be shared with other SAS programmers for additional efficiency. This paper will introduce the how to create abbreviations in SAS® enhanced editor and provide some useful examples of abbreviations in our daily work.

Perl Regular Expressions - A Powerful Tool to Manage Text String
Weston Chen, Novartis

Perl Regular Expressions (PRX) are introduced to SAS mainly for locating patterns in text string. It can enable SAS to perform special actions based on the function of searching the called metacharacters. Comparing to the traditional string functions, the advantage of the PRX is that it can provide a much more compact solution to a complicated string manipulation task, especially to deal with some highly unstructured and complicated data streams. It's a powerful tool which can help your work in an efficient way when manage the text string. This paper will show you what the metacharacters are, how to design and create an efficient PRX, and how to apply PRX in clinical trial programming examples.

A Macro to Automatically Select Covariates from Prognostic Factors and Exploratory Factors for Multivariate Cox PH Model
Yu Cheng, Eli Lilly and Company

Multivariate Cox PH model is a widely used analysis to estimate hazard ratio by constructing a model based on the selected variables among a large number of factors. The selection algorithm is normally simplified as: 1) use a full model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at a pre-specified alpha level based on certain selection method, 3) fit a reduced model with only selected variables and variables forced into the final model, 4) repeat the above steps and construct other models on different combination of covariates. This paper presents a macro to automatically go through the process and generate final reports for clinical trial reporting use. From the examples described, the purpose is to provide a thought that other programmers can use to automatically generate a batch of analysis reports in the shortest possible time.

An efficient way to manipulate huge number of program files in a project
Long Fang, PPD

It's common that hundreds of Tables, Listings, and Figures (TLFs) are generated in one clinical trial project. When it involves in double programming and batch submitting, with the addition of log files and lst files, the number of SAS related files could be tripled or even more. Consequently, it becomes a headache to identify a specific file within the massive number of files. When version control system is used, the system requires considerable time to track the file version when reviewing files and folders in file explorer. Locating and opening a specific file become more complex in the situation that filenames are not consistent with the TLF numbers In this paper, some SVN commands, SAS DM commands, and macros will be introduced to improve the efficiency when dealing with massive files.

An approach to fostering an environment of good programming
James Gallagher, Novartis
Nancy Ni

ABSTRACT The pharmaceutical industry faces continued and increasing demands to drive down costs and optimize operational activities. It was recognized within the Statistical Reporting team at Novartis General Medicines that statistical programmers can contribute to the above demands by actively practicing good programming principles. This paper discusses an initiative run within Novartis called 'Novartis Programming Practice'. The team had a primary goal of fostering an environment of good programming practice. Incidentally, the team anticipated an indirect positive by-product of the initiative as potentially reinvigorating an increased passion in the 'art of programming' and this is also covered in the paper. Primarily, the paper details initial framing of the initiative, content, training methodology, embedding strategy and early indicators of outcomes. Finally, the authors express some ideas on how PhUSE could take their Good Programming Practice initiative to the next level. SUMMARY The pharmaceutical industry faces continued and increasing demands to drive down costs and optimize operational activities. It was recognized within the Statistical Reporting team at Novartis General Medicines that statistical programmers can contribute to the above demands by actively practicing good programming principles. The Statistical Reporting Leadership Team endorsed the proposed "Novartis Programming Practice" initiative, with the initial goal to reinforce adherence to better programming standards through delivering a set of trainings and redefining our working practice. Two interactive workshops were designed and delivered. The first workshop was designed to understand the importance of Programming Practice in the Novartis environment and better understand how different software development factors (SDFs) contribute to this. During the first workshop, attendee's described what excellence looks like for each factor (eight factors covered in total); based on different scenarios (e.g. How important is efficiency in the context of Health Authorities Questions). The workshop demonstrated that quite often a 'trade-off' occurs between two or more SDFs and finding the optimal balance is often driven by context. The second workshop was more technical and focused on efficiency. Different tips and techniques were presented and followed by practical exercises and/or discussions. The SAS Institute also helped us to select the appropriate techniques. The paper discusses the tra ining methodology and logistics preparation for both workshops. One other key important component was the embedding strategy discussed within the team. Quite often, colleagues leave training with good intent but the project work consumes and it rarely manifests into frequent application of learnings. In conjunction with what was noted above, the paper also discusses some of the embedding strategies implemented by the team and gives an early indication into some positive outcomes ACKNOWLEDGMENTS With many thanks to James Gallagher for his support, encouragement and thorough review.

Tips and Techniques for Creating Output with Indeterminate Columns on Multiple Pages
Hui He
Yin Ling, Author
Shunhua Wu, Author

In clinical studies, data listings could have indeterminate columns on multiple pages. And if columns are too many to be fit on the same page, the programming process will be more challenging. General functions (ID/FLOW) in PROC REPORT can no longer handle these customized outputs. We will introduce a macro program with tips and techniques that will manage the output on different pages in a scientific meaningful way and with pretty layout that will be easy to read.

It's Not Just Fast Merging, It's Powerful Hashing
Mijun Hu, Novartis Pharma Co., Ltd.

Ever since hash object was introduced in SAS® 9, a bunch of papers have commended its high merging efficiency but however overlooked its other powerful functions. In fact, besides the widely used find method of performing the merge, we can also utilize check, find_next, add and output method, etc. to achieve certain purposes with higher efficiency and readability but less coding. The advantage is utterly obvious in large database such as ADLB because it eliminates several times of entire database copying in terms of manipulating multiple hashing within one data step. This paper is intended for any intermediate level hash users who mainly only use find method but would like to explore other functionalities of hash object in real examples.

A Multi-processing Tool to Batch Submit a List of Programs with Real Time Feedback and Dashboard Email Notification
Huashan Huo, Pharmaceutical Product Development, LLC.
Fanyu Li

In clinical research, it is very common that a large number of SAS programs are to be repeatedly batch run due to program modifications or new data updates. In the past few years, several papers authored by pharmaceutical industry programmers (Gilbert Chen 2002; Shu 2006; Prescod Cawley 2010; Wong Sun 2010; Conover 2011; Andrew E. Hansen 2013) were published in this area to describe methods and tools for automating this process. The purpose of this paper is to introduce a tool for multi-processing batch submitting a large number of SAS programs concurrently. Overall execution time is significantly reduced by using this tool. Users can also get real-time status feedbacks on SAS execution progress and email notifications with a Dashboard report on the completion status once batch run completed.

Creating Binary Tables in a Snap
Yan Liu, Sanofi

Summary tables for binary variables are very common in clinical trials statistical analysis. Conventional methods usually require stacking a bunch of rows of binary variable analytical outputs to get the results desired. It is time-consuming and inefficient from the programming point of view. This paper introduces a convenient SAS macro that can create binary tables with one simple macro call and is capable of using either subject level data set (ADSL) or Basic Data Structure (BDS) ADaM data sets as input data sets or a combination of these two types of data sets in case of mixed requirements.

Fifty Shades of Sorting
Haibin Shu, AccuClin Global Services LLC
Elena Rojco, Denta Quest
John He, AccuClin Global Services LLC

The title of 'Fifty Shades of Grey' means a lot of facets of the main character Grey's personality, likewise, the tricks and tips of sorting variables in SAS go way beyond the syntax of PROC SORT. In many circumstances a good order does matter therefore customized sorting variables are to be adopted for achieving a desirable sorting effect. A variety of examples will be presented to illustrate in details why customized sorting variables are needed in these specific situations and how they are derived accordingly. Further, the ideas can be generalized to prototype/establish a systematic approach. For the consideration of the effect of reporting and presenting information that often depends on whether a good order is in place SAS programmers may use PROC PRINT, PROC REPORT, or even a DATA SET with sorting techniques to consistently solve certain challenging issues.

The loop within: A macro on looping variables and observations
Eduard Joseph Siquioco, PPD

Producing the same outputs for different treatments, patients, phases can be tedious when typing a lot of codes when in fact you only need one. This looping macro will help in producing these kinds of outputs for SDTM or TLFs. This macro will also show a procedure where variables can be used in the looping calls for a more customized transposition of dataset. The paper will introduce the basics of looping and iterating inside macros and will also include a walkthrough of the macro being used for variables, and observations.

A macro to re-size character variable length
Jianjun Tan, Sanofi
Eric Liao, Sanofi

FDA has a limit on the size of submitted data, i.e. one single data file no greater than 1 GB. However, there are common situations with data files greater than 1GB in reality of submissions. So there are requests to resize the data files to meet the FDA's request. To facilitate and automate the resizing data files, a macro was developed as specified in FDA/PhUSE document 'Data Sizing Best Practices Recommendation', to optimize the size of dataset through managing character variable length to save wasted space in data set. This macro can automatically determine the maximum number of characters and limit the length of character variables for multiple datasets. After length limiting, the macro also will ensure that no data is truncated.

SAS Longitudinal Data Techniques - From Change from Baseline to Change from Previous Visits
Chao Wang

Longitudinal data is often collected in clinical trials to examine the effect of treatment on the disease process over time. The most common operation against this type data is to calculate the change from baseline (CFB). The proven ways to implement this calculation in SAS may include data splitting, one pass DOW (Dorfman-Whitlock DO) - Loop, retain plus LAG function. Sometimes, in addition to CFB, change from previous visits (CFP) could be interesting endpoints for a trial. This paper demonstrates how CFP can be derived with minimum coding by using SAS LAG (N) Function.

Using GSUBMIT command to customize the interface in SAS®
Xin Wang, Fountain Medical Technology Co., ltd

One of the reasons that SAS is widely used as the statistical analysis tool in analyzing data from clinical trials is that SAS code can be documented and rerun at a later time. As a SAS programmer, during our daily work, we often write lots of small scripts to debug the code or explore the data. These scripts could be used repeatedly in the same programs or in different programs. Here we will introduce one SAS command GSUBMIT that will allows us to customize the SAS interface and run a short SAS program with one click. As a result, those small scripts can be used easily and efficiently.

Application of Output Delivery System in Creating Customized Targets
Yuan Wang, Fountain Medical Development, Inc

The default results from SAS STAT PROC contains many useful information. Organizing these output objects in SAS can be confusing, customizing the analysis summary results into required clinical output would be more tedious. In fact SAS Output Delivery System (ODS) is a powerful tool in dealing with all these issues. In this paper, we will present how to use SAS ODS to select a statistic output, convert a processed output into a dataset, organize them into a pretty RTF output and add some special symbols to explain the statistic model.

A customized tool for systematic input/output dependency diagnose
Chunlan Xing
Chao Wang

After the final database lock in a clinical study, when facing changes in one of few raw datasets through database re-open/re-lock, study team will need to make a strategic decision based on the impact of these changes to the analysis datasets and summary output. In the case that the team decides to re-run only those analysis datasets and output that are related to these raw data changes, the dependency between the changed raw datasets and the analysis datasets and output will need to be diagnosed. A systematic diagnose of such dependency will ensure all related programs be re-run. In this paper, the SAS I/O functions and SAS Dictionary techniques will be used to create a customized report for dependency relationship checking.

Seamless Data Exchange between SAS and MS Word Documents Through the Integration of VBS in SAS Programs
Jia Yang, Mr
Stanley Wei, Novartis

SAS programmer in pharma uses SAS as primary platform to process clinical trial data, perform statistical analysis. Currently more and more requests are raised to seamlessly exchange external data from external files with SAS, incorporate those steps into standard SAS code, and build up non-stop, automatically work flow. However, some of requests are not easily achieved due to un-common data format, or any other non-standard data source. They require programmer to handle the data from uncommon data source additionally, resulting in some inflexibilities, reduced efficiency of the process of report generation and statistical analysis. In this paper we will explore the capabilities of SAS to manipulate external files or data sources by generate and execute Vbscript code inside SAS environment. SAS statement FILENAME and its applications in retrieving data from external data source are also demonstrated to enable exchange data from external data source with internal SAS, and how those processes can enable automatic flow control and improve efficiency.

A Macro to Add Variables to SDTM Standard Domains
Xianhua Zeng, PAREXEL

In SDTM domains, all character variables are limited to a maximum of 200 characters due to FDA requiring datasets in SAS v5 transport format. Text more than 200 characters long should be stored as a record in the SUPP--dataset. But Comments (CO) and Trial Summary (TS) domains are allowed to add variables for the purpose of handling text exceeding 200 characters sections. To improve readability the text should be split between words not just broken the text into 200-character - i.e., when text is longer than 200 characters in CO domain, additional variables COVAL1-COVALn will be derived. The first 200 characters of the comment will be in COVAL, the next 200 in COVAL1, and additional text stored as needed to COVALn. This paper presents the AddVar macro, which splits a long text variable into a set of smaller variables without truncating an intact word and automatically generate COVAL- COVALn in CO domain or TSVAL- TSVALn in TS domain. AddVar checks that the user inputs are valid and that the specified split character does not already exist in the input variable. The newly created variables are added to the output data set.

Configuring SAS® Business Intelligence (BI) client with the SAS® server to support multilingual data
Wei Zheng, SAS Institute Inc.

As the pharmaceutical industry has become an increasingly global enterprise, more and more companies are processing multilingual data. In order for a SAS® Business Intelligence client to support this type of data, the language environment must be configured correctly between the SAS® client and the SAS server. Proper configuration includes the correct locale setting on the client side and the correct encoding on the server side. This paper, written for SAS users and SAS administrators, introduces several methods for configuring an effective and optimized multilingual working environment for SAS clients such as SAS® Enterprise Guide®, SAS® Data Integration Studio, and SAS® Enterprise Miner". One item of particular interest in the paper is a unique approach to setting the workspace server to different encoding values on a per-user basis. The methods that are discussed apply to SAS servers in both the Microsoft Windows and UNIX operating environments. Each method's advantages and disadvantages are detailed, along with an example scenario that illustrates when the method is best used. SAS® Enterprise Guide® 7.11 and SAS® 9.4 Unicode Server are used, respectively, as the client and server in the examples.

Statistics including Pharmacokinetics

Survival Plots using SAS PROC LIFETEST, GPLOT, and SGPLOT: What are their difference?
Ka Chun Chong, Shenzhen Research Institute, The Chinese University of Hong Kong
Chung Ying Zee, The Chinese University of Hong Kong

Kaplan Meier survival curve is a useful non-parametric approach to summarizing the time-to-event data such as the overall survivals in cancer studies. In the SAS system, LIFETEST, GPLOT, and SGPLOT procedures are common ways to generate the survival curves. However, the logistics in SAS programming are different among these procedures. In this paper, the difference will be demonstrated by making use of some examples.

Hands-on Tutorial for Piecewise Linear Mixed-effects Models Using SAS® PROC MIXED
Qinlei Huang, St Jude Children's Research Hospital

Clinical trials and public health studies are focused on studying changes associated with interventions, events, or critical periods in human development. Evaluation of the impact of 'critical' or high-risk events/periods in longitudinal studies of growth may provide clues to the long term effects of life events and efficacies of preventive/therapeutic interventions. Conventional linear longitudinal models typically involve a single growth profile to represent linear changes in an outcome variable across time, which sometimes does not fit the empirical data. The piecewise linear mixed-effects models allow different linear functions of time corresponding to the pre- and post-critical time point trends. This hands-on tutorial first introduces how to perform piecewise linear mixed effects models using SAS PROC MIXED step by step, in the context of a clinical trial with two-arm interventions and a predictive covariate of interest; and then illustrates how to obtain the slopes and corresponding p-values for intervention and control groups during pre- and post-critical periods, conditional on different values of the predictive covariate; and third explains how to make meaningful comparisons and present results in a scientific manuscript. Illustrative SAS commands are provided to fit piecewise linear mixed-effects models and to generate the summary tables assisting the interpretation of the results.

Hands-on Tutorial for Propensity Score Methods in Pharmaceutical Research Using SAS®
Qinlei Huang, St Jude Children's Research Hospital

In pharmaceutical and public health research, intervention evaluations are often the basis of observational studies when randomization is infeasible. In observational studies, subjects often enroll in an intervention by self-decision instead of random assignment, causing the problem of "selection bias." Propensity score methods offer an efficient way to reduce selection bias by balancing the intervention and control groups on a single scalar. This paper gives a tutorial for propensity score methods from the statistical background to hands-on applications using SAS®. It first explains how to calculate a propensity score and perform a balancing evaluation using SAS® procedures. It then illustrates four different methods to apply propensity scores by using SAS® step-by-step: matching, weighting, stratification, and regression adjustment. Hands-on examples with SAS codes are provided.

Comparison of Three Methods in Calculating Confidence Intervals for Rate Differences in Non-inferior Clinical Trials
Yaohua Huang

Non-inferior clinical trials are widely used in pharmaceutical products development. While the primary endpoints are categorical variables, confidence intervals for rate differences are usually calculated. These confidence intervals are compared with pre-specified non-inferior margin to judge whether non-inferiority assumption can be reached. Approximately normal, Exact probability, and Newcobe-Wilson Score methods are introduced in this paper to calculate the confidence intervals for rate differences. In addition, power analysis of these methods will be conducted with Monte Carlo simulation. If sample size is planned using Approximately normal, all 3 methods can provide enough power if at least one group's rate is neither 0% or 100%. If both groups' rates are 0% or 100% only Newcobe-Wilson Score method can provide the confidence interval but the power is not sufficient.

Clinical trials optimization: Monte Carlo Simulation modeling and SAS applications
Ye Meng

Modeling and clinical trial simulation is a tool that is being used by pharmaceutical companies and FDA to improve the efficiency of drug development. Monte Carlo Simulation is a modern and computationally efficient algorithm. Therefore it is a brilliant technique in terms of patient recruitment process and dose calculation in clinical design. The purpose of this paper is to describe how Monte Carlo simulations are tasked with evaluating parameter distributions and its applications in PROC MI. Here we also describe two approaches, BY command / Proc IML each with a presented example in SAS that leads to short and fast simulation programs that can be used in many practically situations.

Why Statistical Analysis Plan (SAP) should be comprehensive.
Riddhi Merchant, Pharmaceutical Products Development
Ranjith Prayankotveettil, Pharmaceutical Products Development (PPD)

A Statistical Analysis Plan (SAP) is one of the key Regulatory documents which comprise of detailed Statistical Methods and techniques used for data analysis. It serves to inform readers about clinical Study Design, the execution of Statistical methods/techniques, Data analysis and Reporting. But do we think that-Are Statistical Analysis Plan always comprehensive enough to proceed with the Data Analysis? As a Statistician do we write SAP keeping in mind all the statistical methods in depth that best correspond to the research hypothesis? This paper will discuss study statistician's responsibilities for preparation of SAP in which he should describe detailed procedures for executing the statistical analysis of the primary and secondary variables which is required in conveying statistical results obtained from stated statistical methods used.

The Application of Tolerance Interval in Defining Drug Response for Biomarker
Jing Pan

This paper intends to introduce the tolerance intervals and how the tolerance limit can be generated using the CAPABILITY Procedure. Also it uses an example to describe how tolerance intervals can be applied to define a drug response by calculating a specific cutpoint for a certain biomarker.

Use of SAS® for Risk-Based Monitoring of Survival Endpoints and Adverse Event Data in Clinical Trials
Zhizhuo Zhang, The Chinese University of Hong Kong
Sichan Tang, The Chinese University of Hong Kong
Ka Chun Chong, Shenzhen Research Institute, The Chinese University of Hong Kong
Chung Ying Zee, The Chinese University of Hong Kong

Risk-based monitoring (RBM) could greatly reduce the cost of a clinical trial while protecting safety of a patient and quality of data. Primary endpoints and protocol-required safety measurements are usually identified as critical data for risk assessments. In this paper, we present how to use the basic tools of SAS to detect potential errors for survivals and adverse events data among different sites during centralizing monitoring. The types of errors are generally recognized as more important than others as they could profoundly affect study results and patient safety. This demonstration of RBM could thus enhance the overall efficiency for monitoring activities.