PM_tutorial/overview.qmd at main · LAPKB/PM_tutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
---
title: "Overview"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, echo=FALSE, eval=TRUE, message=FALSE}
library(Pmetrics)
library(gt)

r_help <- function(pkg, name) {
    glue::glue("[`{name}`](https://rdrr.io/pkg/{pkg}/man/{name}.html)")
}

gh_help <- function(name) {
    glue::glue("[`{name}`](https://lapkb.github.io/Pmetrics/reference/{name}.html)")
}


custom_table <- function(tab){
  #system.file("extData", tab, package = "Pmetrics") %>%
    read.csv(file.path("Data",tab), na.strings = ".") %>% gt() %>%
    tab_style(
      style = list(
        cell_fill(color = "black"),
        cell_text(color = "white", weight = "bold")
      ),
      locations = cells_column_labels(everything())
    )
}

pmetrics <- function(){
    knitr::asis_output("[Pmetrics]{style=\"color: #841010; font-family: 'Arial', Arial, sans-serif; font-weight: 900;\"}")
}

```

This chapter provides an overview of the architecture of the `r pmetrics()` package, including its
main software engines, control functions, and other functions for data manipulation,
model selection, diagnostics, and plotting.


## R6 architecture

**As of v. 2.0**, `r pmetrics()` uses an "R6" architecture less dependent on
reading and writing files, preserving hard drive space and speeding execution,
since file I/O is slow. The goal is to simplfy work within a session by
eliminating the need to repeatedly copy and paste files from one folder to the next
whenever a new fit is executed. Storing critical objects in memory means they
can be used without accessing files.

However, data and model files are still used for longer term storage
and preserving work from one session to another. The format of those files,
both data and model, will be familiar to long term Pmetrics users.

Data files are generally .csv and the format is detailed in the chapter on [data](data.qmd).
A model can be read from a text file or can be defined directly in R.
The easiest way to accomplish this is with our model builder app. Whether choosing
to define models in R, with the builder app, or in a .txt file, details can be
found in [models](models.qmd).

:::{.callout-note}
The R6 architecture is object-oriented, meaning that data and functions
are bundled together into objects. Objects have **fields** which contain data,
and **methods** which are functions that operate on the data. Both are accessed with the `$` operator.
This is different from the previous
functional programming approach (called "S3" in R) used in Legacy `r pmetrics()`, where data and functions
were separate. The R6 architecture allows for better organization of code,
easier maintenance, and improved performance.
:::

For example, a `PM_data` object has fields such as `$data` and `$standard_data`, which contain the original and standardized data, respectively. It also has methods such as `$summary()` and `$plot()`.

***NOTE:*** Inclusion of the `()` after the method name is what distinguishes a method from a field. The parentheses are always required when calling a method, even if there are no arguments.

## Rust as the backend

**As of v. 3.0**, `r pmetrics()` uses Rust for the core of the package and no longer
uses Fortran. This is for several reasons, including the fact that Rust is a modern
programming language that is more efficient and easier to maintain than Fortran.
Rust is memory safe, which means that it prevents common programming errors that can lead to
memory leaks and crashes. This makes it a more reliable choice for high-performance computing tasks.
Additionally, Rust has a growing ecosystem of libraries and tools that make it easier to work with
data and perform complex computations and a more flexible choice for a wide range of applications.

## Software engines

There are three main software engines that `r pmetrics()` controls currently.

<!-- * **IT2B** is the ITerative 2-stage Bayesian parametric population PK modeling -->

<!-- program. It is generally used to estimate parameter ranges to pass to -->

<!-- NPAG. It will estimate values for population model parameters under the -->

<!-- assumption that the underlying distributions of those values are normal -->

<!-- or transformed to normal, e.g. log normal. -->

-   **NPAG** is the Non-parametric Adaptive Grid software. It will create a non-parametric population model consisting of discrete support points, each with a set of estimates for all parameters in the model plus an associated probability (weight) of that set of estimates. There can be at most one point for each subject in the study population. There is no need for any assumption about the underlying distribution of model parameter values.

-   **NPOD** is the Non-parametric Optimal Design software. Like NPAG, it will create a non-parametric population model consisting of discrete support points, each with a set of estimates for all parameters in the model plus an associated probability (weight) of that set of estimates. While NPAG searches parameter hyperspace randomly, NPOD uses likelihood gradients to search systematically. This usually results in faster convergence, but currently, NPOD is more likely to get stuck in a "local" minimum of the likelihood surface, i.e. it has a higher chance of not finding the globally optimal support points. NPOD is under active development.

-   The **Simulator** is a semi-parametric Monte Carlo simulation software program that can use the output of NPAG of NPOD to build randomly generated response profiles (e.g. time-concentration curves) for a given population model, parameter estimates, and data input. Simulation from a non-parametric joint density model, i.e. NPAG output, is possible, with each point serving as the mean of a multivariate normal distribution, weighted according to the weight of the point. The covariance matrix of the entire set of support points is divided equally among the points for the purposes of simulation.

**Let's look at how these engines are used to fit models to data and simulate.**

## Model functions

### Model fitting

```{mermaid}
%%| eval: true
flowchart LR

subgraph RSession[" "]
  direction TB
  %% DATA["PM_data"]
  mid(("edit"))
  MODEL["PM_model"]
  RESULT["PM_result"]


end

DISK[("Hard Drive")]

MODEL -.-> mid(("edit")) -.-> MODEL
MODEL -- "$fit(PM_data, ...)" --> RESULT
%% RESULT -- "edit" --> MODEL
DISK -- "PM_load()" --> RESULT
RESULT -- "automatic" --> DISK


classDef blue fill:#2f6db6,stroke:#184a8b,color:#fff;
classDef orange fill:#c7662b,stroke:#8a3f18,color:#fff;
classDef disk fill:#d2d3d7,stroke:#7f8084,color:#000;
classDef ghost fill:transparent,stroke:transparent,stroke-width:0px,padding:0px,font-style:italic, color:#555;

class mid ghost;
class DATA,MODEL blue;
class RESULT orange;
class DISK disk;

linkStyle 4 font-style:italic, color:#555

style RSession fill:#e9f0ff,stroke:#9ab0d6,stroke-width:1px,rx:2,ry:2

```

* `r pmetrics()` uses `r gh_help("PM_data")` to create data objects and `r gh_help("PM_model")` to create model objects.
* When a model is created, it is compiled and ready to be combined with the data and parameters (`...`) within the model's `$fit()` method.
* The `$fit()` method uses NPAG/NPOD to create a `r gh_help("PM_result")`, which contains the probability distributions for primary model parameter values as well as many methods to summarize, examine, and plot the results.
* Edit models to try and improve the fit as needed. This step occupies most of your time when modeling.

`PM_data`, `PM_model`, and `PM_result` objects and their methods, including `$fit()`, are extensively documented within R and in subsequent chapters here.

:::{.callout-tip}
Help for `r pmetrics()` functions and package data (and for any other package) are available in R by using the `?command` syntax in your R console, as illustrated below.
```{r}
#| echo: true
#| eval: false
#| label: help-example
# get help for PM_model
?PM_model
```
:::

The `$fit()` method in `PM_model` objects replaces these Legacy functions: `ITrun`, `ERRrun`, `NPrun`.

### Simulation

As for all `r pmetrics()` engines, the simulator requires a **model** and **data**. The data serve as the template for the simulation, i.e., the dosing and sampling times, covariates, etc. The model defines the structural and error models to be used. Additionally, the simulator requires a population **prior** joint probability distribution for the parameter values from which it samples.  The population parameter values and their distributions can come from either a previous model fit or be defined directly. More details can be found in the [simulation](simulation.qmd) chapter.
```{mermaid}
%%| eval: true

flowchart LR

subgraph RSession[" "]
  direction TB
  NEW["Prior<br>PM_data<br>PM_model"]
  RESULT["PM_result"]
  SIMULATION["PM_sim"]

end

DISK[("Hard Drive")]


RESULT -- "$sim(<i>PM_model, PM_data</i>, ...)" --> SIMULATION
NEW -- "PM_sim$new(<b>prior, PM_model, PM_data,</b> ...)" --> SIMULATION
SIMULATION -- "$save()" --> DISK
DISK -- "PM_sim$new()" --> SIMULATION


classDef blue fill:#2f6db6,stroke:#184a8b,color:#fff;
classDef orange fill:#c7662b,stroke:#8a3f18,color:#fff;
classDef green fill:#3ca34d,stroke:#276b33,color:#fff;
classDef disk fill:#d2d3d7,stroke:#7f8084,color:#000;

class NEW blue;
class RESULT orange;
class SIMULATION green;
class DISK disk;

style RSession fill:#e9f0ff,stroke:#9ab0d6,stroke-width:1px,rx:2,ry:2

```

As shown in the diagram above, there are two ways to invoke the simulator, both of which replace the Legacy `SIMrun` function.

1.  Use the `$sim()` method attached to `PM_result` objects. In this case, the model, data, and prior parameter value probability distributions are all obtained from the previous fit contained within the `PM_result` object. You can *optionally* override the `PM_result` model or data by providing new values as arguments to the `$sim()` method. The model parameters must match the parameters in the original fit, and the data must contain values for any covariates used in the model.
2.  Use `PM_sim$new()` for models, parameter value probability distributions and template data not derived from a previous fit, e.g. when lifted from an article. Creating simulations this way allows total freedom with respect to the model, the prior, and the data template, but all three inputs (prior, model, data template) are **required**.

Whether called by `$sim()` on a `PM_result` object, or using `PM_sim$new()`, the simulator will execute directly within R and return a `PM_sim` object. There is no longer any need to run the Legacy function `SIMparse`.


### Saving

The `r pmetrics()` R6 objects `PM_data`, `PM_result`, `PM_sim`, `PM_valid`, and `PM_pta` all have a `$save()` method. This method saves the object to the hard drive in the current working directory by default. The format is **.rds** which is a binary format used by R to save individual objects. The purpose of the `$save()` method is to enable retrieval of the object at a later time. As of v. 3.0, `PM_model` objects do not have a `$save()` method, as they should be defined directly in your R script. See [models](models.qmd#mb-files) for details.

### Loading

After a successful model fit, `PM_load()` creates a `PM_result` object rather than loading run results into the current environment and suffixed with the run number as for Legacy `r pmetrics()`, e.g., op.1, final.1, etc. are no longer are created.

To load previously saved .rds ojbects, for  `PM_sim`, `PM_valid`, and `PM_pta`, use the `$new()` method and provide the full or relative path (if not in the working directory) and name of the .rds file created by the corresponding `$save()` method.

### Report generation

The `r gh_help("PM_report")` function is automatically run at the end of a successful fit, and it will generate an HTML page with summaries of the fit, as well as the .Rdata files and other objects. The default browser will be automatically launched for viewing of the HTML report page. You can also regenerate the report at any time by calling the `$report()` method on a `PM_result` object.

:::{code-copy='false'}
```{r echo=T, eval=FALSE, label=report-method}
# example, will not run
run1 <- PM_load(1)
run1$report() # regenerates the report
```
:::

## Other functions

Within Pmetrics there are also functions to manipulate data.csv files and process and plot extracted data, which we will cover in subsequent chapters.

### Data manipulation

Comparison between the current and the Legacy methods are shown for education.

```{r echo=F, eval=TRUE, label=custom-table}
custom_table("RLcomp_data.csv")
```

### Model selection and diagnostics

Comparison between the current and the Legacy methods are shown for education.

```{r echo=F, eval=TRUE, label=custom-table-model}
custom_table("RLcomp_valid.csv")
```

### Other functions

Comparison between the current and the Legacy methods are shown for education.

```{r echo=F, eval=TRUE, label=custom-table-other}
custom_table("RLcomp_other.csv")
```

<!-- * Process data: `make_AUC`, `makeCov`, `makeCycle`, `makeFinal`, `makeOP`, `makePTA`, -->

<!-- `makeErrorPoly` -->

<!-- * Plot data: `plot.PMcov`, `plot.PMcycle`, `plot.PMfinal`, `plot.PMmatrix`, -->

<!-- `plot.PMop`,`plot.PMsim`, `plot.PMvalid`, `plot.PMpta` -->

<!-- * Pmetrics function defaults: `setPMoptions`, `getPMoptions` -->

Again, all functions have extensive help files and examples which can be examined in R by using the `help(command)` or `?command` syntax.