1669082563

What Is Kaplan-Meier Curve?

In this article, let's learn about What is the Kaplan-Meier curve? | Kaplan-Meier Curve Explained. In our everyday life, we come across many time-to-event examples. What does time-to-event mean? It is a course duration variable for each case/subject of interest having a beginning and an end anywhere along the timeline of the complete study. Some common examples are clinical study for a drug, wickets falling in an innings of a cricket match, overhauling a machine before it is decommissioned, etc.,  did you notice, there is something in common among the examples? Yes, it is the study of survival. One effective way to estimate the survival function is by using KM analysis. The Kaplan Meier Curve is an estimator used to estimate the survival function. The Kaplan Meier Curve is the visual representation of this function that shows the probability of an event at a respective time interval. The curve should approach the true survival function for the population under investigation, provided the sample size is large enough.

In this article, let’s see in detail what KM analysis is, how the Kaplan-Meier Curve is built, the math behind calculating the probabilities of survival. But before diving directly into the KM analysis, we shall have a quick and brief walk around on what is survival analysis and basic notations used in the analysis. Learning about Statistical Methods for Decision Making can help you understand the concepts more clearly.

What is the Kaplan-Meier curve?

The Kaplan-Meier curve is a graphical representation of the survival function. The curve is named after Edward Kaplan and Meier, who developed the technique in the 1950s. It is a non-parametric estimate of the survival function that does not make any assumptions about the underlying distribution of the data. The Kaplan-Meier curve is used to estimate the survival function from data that are censored, truncated, or have missing values. It shows the probability that a subject will survive up to time t. The curve is constructed by plotting the survival function against time.

Survival Analysis

Survival analysis is a statistical procedure for data analysis in which the outcome variable of interest is the time until an event occurs. The time can be any calendar time such as years, months, weeks or days from the beginning of follow-up until an event occurs. By event, we mean recovery, death, breakdown of a machine, wickets in an innings or any designated experience of interest that may happen to the case/subject.

Goals of Survival Analysis

Survival analysis has three goals to be addressed:

1. To estimate and interpret survivor and/or, hazard functions from survival data
2. To compare survivor and/or, hazard function
3. To assess the relationship of explanatory variables to survival time

We hope you have got a picture of what survival analysis is and its goals. Next, we shall understand the notations used in the analysis and a basic interpretation of KM curve (a detailed explanation to be followed).

Basics of Kaplan Meier curve

When using Kaplan Meier analysis, we should concentrate on three variables:

1. Serial time of the subject
2. Their status at the end of their serial time (event occurrence or censored)
3. The group of study they belong to

The serial time for the individual subjects should be arranged from the shortest to the longest, regardless of when they entered the study. The serial time duration of known survival is terminated by the event of interest. This is known as an interval. Only an occurrence of the event defines known survival time intervals. Whereas, censored subjects do not terminate the interval. Here, there is a possibility of two things to happen.

1. A subject can have the event of interest.

2. They are censored. As we discussed what event just above, this time we will define what censored data is.

Also Read: Multinomial Naive Bayes Explained

Censored data

The straight definition of censored data is the information about a subject’s survival time is incomplete. This is a problem which most survival analyses suffer from. This can happen when something negative for the study happens, such as:

1. A person does not experience the event before the study ends
2. A person is lost to follow up during the study period
3. A person withdraws from the study because of some reason

Understanding the KM Analysis

After so much theory and explanations on KM analysis, we shall move into the creation and interpretation of the KM curve.

For this, let’s consider an example where a drug is being tested on two groups of people (male and female). There are six subjects in each group (for ease of understanding). The serial time and the status at the serial time are given in the table below. Status at the serial time of 1 means the occurrence of an event, and 0 means, the subject is censored. The objective is to find the cumulative probability of survival and to find is there any significant difference in the drug between the groups.

Table 1 – Initial sorted table for KM analysis

As discussed earlier, the basic elements required for the analysis are 1. Serial time, 2. Status at the serial time and the group to which the subject belongs to. The data are entered in a table and is sorted by ascending serial times beginning with the shortest times for each group. Notice, each group has one censored subject. In a group which has male subjects, it is at the end of the trial, and in the other group, the subject was censored within the study timeline.

After constructing the table, we can use any statistical tools such as SPSS, Sigmaplot, R, Excel to plot the KM curve. First, let us see how to plot the KM curve and analyse the results with R software, then let’s have a quick walk around through the stats and calculation behind the computation of survival probabilities.

KM Analysis using R

Step1: The packages used for the analysis are survival and survminer. Use install.packages( ) to install these libraries just in case if they are not pre installed in your R workspace.

Step2: The next step is to load the dataset and examine its structure. The data we will use for this analysis is the same as shown above. The data is saved as a csv file and the same is imported for the analysis in R.

Step 3: After this we are ready to create the survival object using the function Surv of the survival package. The object is stored in the surv_object as a destination. Survival object is basically a compiled version of the serial time and status. A + sign behind the survival time indicates censored data points.

Step 4: The next step is to fit the kaplan-Meier curves. For doing this we need to fit the survival function with the survival object and the group of interest. This fitting can be done using the survfit function of the survminer library. The survival object created in the previous step is given as a function of the group we have considered for the analysis.

The summary of the resulting fit_1 object shows, among other things, survival times, the proportion of surviving patients at every time point.

The table below is the table output of the survival analysis. It shows the time at which the event has taken place, number of subjects at risk after each event, cumulative survival probabilities, standard error associated with each probability and it’s upper and lower 95% confidence intervals for both the groups (the calculation behind the table and the stats are discussed later in this article).

Table 2 – survival analysis output

Step 5: After the above step it is now time to plot the KM curve. The corresponding survival curve can be examined by passing the survival object to the ggurvplot() function with pval = TRUE. This argument is very useful, because it plots the p-value of a log rank test as well, which will help us to get an idea if the groups are significantly different or not.

In table 2, it can be seen that the last subject of the female group has no cumulative probability of survival assigned to it, and there is a sudden drop in the probability for the third subject. Whereas in the other group, the last subject has a probability associated with it and the fall in probability is little lesser than the former group. It is because in the female group there is a subject that got censored in the middle (after the second event) and hence there is no subject left at the end to calculate the probability scores. It is because of that the probability has fallen steeply after the second event. In the case of the male group, the subject that got censored is only at the end, and hence the probability will not approach zero.

I know this is a little confusing, but worry not we will get it cleared in the coming pages.

Decoding the KM curve and Analysis

Look at the KM curve in the figure. The survival duration of a subject is represented by the length of the horizontal lines along the X-axis of serial times. The occurrence of the event terminates the interval. The vertical lines are the event of interest happening, and the vertical distances between horizontals are important because they illustrate the change in the cumulative probability of surviving a given time as seen in the Y-axis. For example, if you belong to a group male, your probability of surviving 11 months is 100% ( x-axis in years); conversely,  if you are in the other group, your probability of surviving the same time is slightly more than 66%. The steepness of the curve is determined by the survival durations.

Looking at the censored objects, the one subject that censored in group female materially reduced the cumulative survival between the intervals. Whereas, the terminally censored subject in the male group did not change the survival probability and the interval was not terminated by an event.

Table 3 – calculation table

The table above shows what happens behind the production of the KM curve. When the above table is cross-referenced with the KM curve, it is evident that intervals and the attendant probabilities are only constructed for events of interest and not for censored subjects. Because an event ends one interval and begins another interval, there should be more intervals than events.

The table explains the way the curves end. In group male, the curve ends without creating another interval below. The cumulative probability of surviving this long is determined by the last horizontal, sixth interval and is 0.166. In the other group, the curve drops to zero after the fifth interval to cause the sixth interval horizontal to be on the X-axis.

Looking at the probabilities of survival, it could be a little confusing that there are two probabilities 1. Cumulative probability 2. Interval probability. The cumulative probability defines the probability at the beginning and throughout the interval. This is graphed along the Y-axis of the curve. The interval survival rate defines the probability of surviving past the interval. i.e. still surviving after the interval and beginning the next.

Censoring affects survival rates. Censored observations that coincide with an event are usually considered to fall immediately after the event. Censoring removes the subject from the denominator, i.e., individuals still at risk. For example, in Group 2, there were three surviving intervals four and available to be at risk in interval five. However, during interval four one was censored; therefore, only two were left to be at risk in interval five, i.e. as seen in Table II the denominator went from four in interval four to two in interval five.

Quick check – Introduction to Data Science

Conclusion

Thus, we calculated the survival probabilities of each subject of two different groups. Though it seems like the male group has a greater probability of survival than the female group, the log-rank test’s p-value of 0.19 tells us that there is no significant difference between the groups. The null hypothesis is there is no difference, and the alternate hypothesis is the groups are significantly different. Since the p-value is greater than 0.05, we fail to reject the null hypothesis. This brings us to the end of the blog on the Kaplan Meier Curve. We hope you enjoyed it.

Original article source at: https://www.mygreatlearning.com

#Kaplan-Meier

1669082563

What Is Kaplan-Meier Curve?

In this article, let's learn about What is the Kaplan-Meier curve? | Kaplan-Meier Curve Explained. In our everyday life, we come across many time-to-event examples. What does time-to-event mean? It is a course duration variable for each case/subject of interest having a beginning and an end anywhere along the timeline of the complete study. Some common examples are clinical study for a drug, wickets falling in an innings of a cricket match, overhauling a machine before it is decommissioned, etc.,  did you notice, there is something in common among the examples? Yes, it is the study of survival. One effective way to estimate the survival function is by using KM analysis. The Kaplan Meier Curve is an estimator used to estimate the survival function. The Kaplan Meier Curve is the visual representation of this function that shows the probability of an event at a respective time interval. The curve should approach the true survival function for the population under investigation, provided the sample size is large enough.

In this article, let’s see in detail what KM analysis is, how the Kaplan-Meier Curve is built, the math behind calculating the probabilities of survival. But before diving directly into the KM analysis, we shall have a quick and brief walk around on what is survival analysis and basic notations used in the analysis. Learning about Statistical Methods for Decision Making can help you understand the concepts more clearly.

What is the Kaplan-Meier curve?

The Kaplan-Meier curve is a graphical representation of the survival function. The curve is named after Edward Kaplan and Meier, who developed the technique in the 1950s. It is a non-parametric estimate of the survival function that does not make any assumptions about the underlying distribution of the data. The Kaplan-Meier curve is used to estimate the survival function from data that are censored, truncated, or have missing values. It shows the probability that a subject will survive up to time t. The curve is constructed by plotting the survival function against time.

Survival Analysis

Survival analysis is a statistical procedure for data analysis in which the outcome variable of interest is the time until an event occurs. The time can be any calendar time such as years, months, weeks or days from the beginning of follow-up until an event occurs. By event, we mean recovery, death, breakdown of a machine, wickets in an innings or any designated experience of interest that may happen to the case/subject.

Goals of Survival Analysis

Survival analysis has three goals to be addressed:

1. To estimate and interpret survivor and/or, hazard functions from survival data
2. To compare survivor and/or, hazard function
3. To assess the relationship of explanatory variables to survival time

We hope you have got a picture of what survival analysis is and its goals. Next, we shall understand the notations used in the analysis and a basic interpretation of KM curve (a detailed explanation to be followed).

Basics of Kaplan Meier curve

When using Kaplan Meier analysis, we should concentrate on three variables:

1. Serial time of the subject
2. Their status at the end of their serial time (event occurrence or censored)
3. The group of study they belong to

The serial time for the individual subjects should be arranged from the shortest to the longest, regardless of when they entered the study. The serial time duration of known survival is terminated by the event of interest. This is known as an interval. Only an occurrence of the event defines known survival time intervals. Whereas, censored subjects do not terminate the interval. Here, there is a possibility of two things to happen.

1. A subject can have the event of interest.

2. They are censored. As we discussed what event just above, this time we will define what censored data is.

Also Read: Multinomial Naive Bayes Explained

Censored data

The straight definition of censored data is the information about a subject’s survival time is incomplete. This is a problem which most survival analyses suffer from. This can happen when something negative for the study happens, such as:

1. A person does not experience the event before the study ends
2. A person is lost to follow up during the study period
3. A person withdraws from the study because of some reason

Understanding the KM Analysis

After so much theory and explanations on KM analysis, we shall move into the creation and interpretation of the KM curve.

For this, let’s consider an example where a drug is being tested on two groups of people (male and female). There are six subjects in each group (for ease of understanding). The serial time and the status at the serial time are given in the table below. Status at the serial time of 1 means the occurrence of an event, and 0 means, the subject is censored. The objective is to find the cumulative probability of survival and to find is there any significant difference in the drug between the groups.

Table 1 – Initial sorted table for KM analysis

As discussed earlier, the basic elements required for the analysis are 1. Serial time, 2. Status at the serial time and the group to which the subject belongs to. The data are entered in a table and is sorted by ascending serial times beginning with the shortest times for each group. Notice, each group has one censored subject. In a group which has male subjects, it is at the end of the trial, and in the other group, the subject was censored within the study timeline.

After constructing the table, we can use any statistical tools such as SPSS, Sigmaplot, R, Excel to plot the KM curve. First, let us see how to plot the KM curve and analyse the results with R software, then let’s have a quick walk around through the stats and calculation behind the computation of survival probabilities.

KM Analysis using R

Step1: The packages used for the analysis are survival and survminer. Use install.packages( ) to install these libraries just in case if they are not pre installed in your R workspace.

Step2: The next step is to load the dataset and examine its structure. The data we will use for this analysis is the same as shown above. The data is saved as a csv file and the same is imported for the analysis in R.

Step 3: After this we are ready to create the survival object using the function Surv of the survival package. The object is stored in the surv_object as a destination. Survival object is basically a compiled version of the serial time and status. A + sign behind the survival time indicates censored data points.

Step 4: The next step is to fit the kaplan-Meier curves. For doing this we need to fit the survival function with the survival object and the group of interest. This fitting can be done using the survfit function of the survminer library. The survival object created in the previous step is given as a function of the group we have considered for the analysis.

The summary of the resulting fit_1 object shows, among other things, survival times, the proportion of surviving patients at every time point.

The table below is the table output of the survival analysis. It shows the time at which the event has taken place, number of subjects at risk after each event, cumulative survival probabilities, standard error associated with each probability and it’s upper and lower 95% confidence intervals for both the groups (the calculation behind the table and the stats are discussed later in this article).

Table 2 – survival analysis output

Step 5: After the above step it is now time to plot the KM curve. The corresponding survival curve can be examined by passing the survival object to the ggurvplot() function with pval = TRUE. This argument is very useful, because it plots the p-value of a log rank test as well, which will help us to get an idea if the groups are significantly different or not.

In table 2, it can be seen that the last subject of the female group has no cumulative probability of survival assigned to it, and there is a sudden drop in the probability for the third subject. Whereas in the other group, the last subject has a probability associated with it and the fall in probability is little lesser than the former group. It is because in the female group there is a subject that got censored in the middle (after the second event) and hence there is no subject left at the end to calculate the probability scores. It is because of that the probability has fallen steeply after the second event. In the case of the male group, the subject that got censored is only at the end, and hence the probability will not approach zero.

I know this is a little confusing, but worry not we will get it cleared in the coming pages.

Decoding the KM curve and Analysis

Look at the KM curve in the figure. The survival duration of a subject is represented by the length of the horizontal lines along the X-axis of serial times. The occurrence of the event terminates the interval. The vertical lines are the event of interest happening, and the vertical distances between horizontals are important because they illustrate the change in the cumulative probability of surviving a given time as seen in the Y-axis. For example, if you belong to a group male, your probability of surviving 11 months is 100% ( x-axis in years); conversely,  if you are in the other group, your probability of surviving the same time is slightly more than 66%. The steepness of the curve is determined by the survival durations.

Looking at the censored objects, the one subject that censored in group female materially reduced the cumulative survival between the intervals. Whereas, the terminally censored subject in the male group did not change the survival probability and the interval was not terminated by an event.

Table 3 – calculation table

The table above shows what happens behind the production of the KM curve. When the above table is cross-referenced with the KM curve, it is evident that intervals and the attendant probabilities are only constructed for events of interest and not for censored subjects. Because an event ends one interval and begins another interval, there should be more intervals than events.

The table explains the way the curves end. In group male, the curve ends without creating another interval below. The cumulative probability of surviving this long is determined by the last horizontal, sixth interval and is 0.166. In the other group, the curve drops to zero after the fifth interval to cause the sixth interval horizontal to be on the X-axis.

Looking at the probabilities of survival, it could be a little confusing that there are two probabilities 1. Cumulative probability 2. Interval probability. The cumulative probability defines the probability at the beginning and throughout the interval. This is graphed along the Y-axis of the curve. The interval survival rate defines the probability of surviving past the interval. i.e. still surviving after the interval and beginning the next.

Censoring affects survival rates. Censored observations that coincide with an event are usually considered to fall immediately after the event. Censoring removes the subject from the denominator, i.e., individuals still at risk. For example, in Group 2, there were three surviving intervals four and available to be at risk in interval five. However, during interval four one was censored; therefore, only two were left to be at risk in interval five, i.e. as seen in Table II the denominator went from four in interval four to two in interval five.

Quick check – Introduction to Data Science

Conclusion

Thus, we calculated the survival probabilities of each subject of two different groups. Though it seems like the male group has a greater probability of survival than the female group, the log-rank test’s p-value of 0.19 tells us that there is no significant difference between the groups. The null hypothesis is there is no difference, and the alternate hypothesis is the groups are significantly different. Since the p-value is greater than 0.05, we fail to reject the null hypothesis. This brings us to the end of the blog on the Kaplan Meier Curve. We hope you enjoyed it.

Original article source at: https://www.mygreatlearning.com

#Kaplan-Meier

1656182700

curve-factory

Permissionless deployment of Curve metapools.

Overview

The metapool factory has several core components:

• `Factory` is the main contract used to deploy new metapools. It also acts a registry for finding the deployed pools and querying information about them.
• New pools are deployed via a proxy contract. The implementation contract targetted by the proxy is determined according to the base pool. This is the same technique used to create pools in Uniswap V1.
• Deposit contracts ("zaps") are used for wrapping and unwrapping underlying assets when depositing into or withdrawing from pools.

See the documentation for more detailed information.

Testing

Testing is performed in a forked mainnet environment.

To run the unit tests:

``````brownie test
``````

You can optionally include the following flags:

• `--decimals`: The number of decimal places for the token used to test the factory pool. Default is 18.
• `--return_value`: The return value given by the token used to test the factory pool. Valid options are `True` and `None`.

Deployment

To deploy the contracts, first modify the `deployment script` to unlock the account you wish to deploy from. Then:

``````brownie run deploy --network mainnet
``````

Author: curvefi
Source Code: https://github.com/curvefi/curve-factory

#curve #dao #blockchain #smartcontract #python

1656167880

Curve Voting Escrow Boost Delegation

The Curve DAO Token is the utility token of the Curve Finance ecosystem, primarily created to incentivise liquidity providers on the platform. However, when staked the CRV token becomes veCRV, a non-transferable, linearly decaying, governance token with 2 special abilities. The first is holders of veCRV receive boosted rewards on provided liquidity (up to a maximum of 2.5x), and the second is holders receive 50% of all the trading fees collected across all Curve Finance pools (including factory pools + altchains) in the form of 3CRV.

With the release of the Curve veBoost, veCRV holders can now delegate a fixed allotment of their veCRV boost to third-party accounts in the form of an NFT. These veBoost tokens, are wrapped packages of boost which eligible gauges (currently only factory gauges) use to determine boosted rewards on provided liquidity. This means any account can receive boosted rewards on provided liquidity, without holding veCRV.

For eligible gauges, the equation for determining your adjusted veCRV balance is:

`Adjusted veCRV Balance = veCRV Balance - delegated veBoost + received veBoost`

Restrictions:

• Minimum delegation period is `1 Week`
• Maximum delegation period is the delegator's veCRV lock end
• Delegators can't delegate more than their `veCRV balance - any outstanding veBoosts`
• Delegators can't newly delegate (or extend a delegation) with any outstanding negative veBoosts
• Delegators can't cancel a veBoost before it's cancel time
• Third parties can't cancel a veboost unless it is expired
• Delegated boost is internally equal to, `abs(sum(delegated veboost))`, meaning an accounts veCRV balance can never get inflated due to negative outstanding boosts
• Received boost is internally equal to, `max(received veboost, 0)`, meaning an accounts received boost balance will never result in decreasing their vanilla veCRV balance

The adjusted formula on chain therefore is:

`Adjusted veCRV Balance = veCRV Balance - abs(delegated veBoost) + max(received veBoost, 0)`

In fact, if an account does not participate in delegating veBoost, their adjusted veCRV balance will never be below their vanilla veCRV balance.

Dependencies

Also check the requirements.txt

Testing

Testing is performed in a local development environment

To run the unit tests:

``````brownie test --stateful false
``````

To run the state tests:

``````brownie test --stateful true
``````

Deployment

To deploy the contracts, first modify the `deployment script` to unlock the account you wish to deploy from. Then:

``````brownie run deploy --network mainnet
``````

Author: curvefi
Source Code: https://github.com/curvefi/curve-veBoost

#curve #dao #blockchain #smartcontract #python

1656139680

curve-dao-contracts

Vyper contracts used in the Curve Governance DAO.

Overview

Curve DAO consists of multiple smart contracts connected by Aragon. Interaction with Aragon occurs through a modified implementation of the Aragon Voting App. Aragon's standard one token, one vote method is replaced with a weighting system based on locking tokens. Curve DAO has a token (CRV) which is used for both governance and value accrual.

View the documentation for a more in-depth explanation of how Curve DAO works.

Testing and Development

Setup

To get started, first create and initialize a Python virtual environment. Next, clone the repo and install the developer dependencies:

``````git clone https://github.com/curvefi/curve-dao-contracts.git
cd curve-dao-contracts
pip install -r requirements.txt
``````

Running the Tests

The test suite is split between unit and integration tests. To run the entire suite:

``````brownie test
``````

To run only the unit tests or integration tests:

``````brownie test tests/unitary
brownie test tests/integration
``````

Deployment

See the deployment documentation for detailed information on how to deploy Curve DAO.

Audits and Security

Curve DAO contracts have been audited by Trail of Bits and Quantstamp. These audit reports are made available on the Curve website.

There is also an active bug bounty for issues which can lead to substantial loss of money, critical bugs such as a broken live-ness condition, or irreversible loss of funds.

Resources

You may find the following guides useful:

Community

Author: curvefi
Source Code: https://github.com/curvefi/curve-dao-contracts

#curve #defi #blockchain #smartcontract #python

1656160620

curve-contract

Vyper contracts used in Curve exchange pools.

Overview

Curve is an exchange liquidity pool on Ethereum designed for extremely efficient stablecoin trading and low risk, supplemental fee income for liquidity providers, without an opportunity cost.

Curve allows users to trade between correlated cryptocurrencies with a bespoke low slippage, low fee algorithm. The liquidity pool is also supplied to lending protocol where it generates additional income for liquidity providers.

Testing and Development

Dependencies

Curve contracts are compiled using Vyper, however installation of the required Vyper versions is handled by Brownie.

Setup

To get started, first create and initialize a Python virtual environment. Next, clone the repo and install the developer dependencies:

``````git clone https://github.com/curvefi/curve-contract.git
cd curve-contract
pip install -r requirements.txt
``````

Organization and Workflow

See the documentation within `contracts` and it's subdirectories for more detailed information on how to get started developing on Curve.

Running the Tests

The test suite contains common tests for all Curve pools, as well as unique per-pool tests. To run the entire suite:

``````brownie test
``````

To run tests on a specific pool:

``````brownie test tests/ --pool <POOL NAME>
``````

Valid pool names are the names of the subdirectories within `contracts/pools`. For templates, prepend `template-` to the subdirectory names within `contracts/pool-templates`. For example, the base template is `template-base`.

You can optionally include the `--coverage` flag to view a coverage report upon completion of the tests.

Deployment

To deploy a new pool:

1. Ensure the `pooldata.json` for the pool you are deploying contains all the necessary fields.
2. Edit the configuration settings within `scripts/deploy.py`.
3. Test the deployment locally against a forked mainnet.
``brownie run deploy --network mainnet-fork -I``

When the script completes it will open a console. You should call the various getter methods on the deployed contracts to ensure the pool has been configured correctly.

4.   Deploy the pool to the mainnet.

``brownie run deploy --network mainnet``

Be sure to open a pull request that adds the deployment addresses to the pool `README.md`.

Audits and Security

Curve smart contracts have been audited by Trail of Bits. These audit reports are made available on the Curve website.

There is also an active bug bounty for issues which can lead to substantial loss of money, critical bugs such as a broken live-ness condition, or irreversible loss of funds.