1669082563
In this article, let's learn about What is the Kaplan-Meier curve? | Kaplan-Meier Curve Explained. In our everyday life, we come across many time-to-event examples. What does time-to-event mean? It is a course duration variable for each case/subject of interest having a beginning and an end anywhere along the timeline of the complete study. Some common examples are clinical study for a drug, wickets falling in an innings of a cricket match, overhauling a machine before it is decommissioned, etc., did you notice, there is something in common among the examples? Yes, it is the study of survival. One effective way to estimate the survival function is by using KM analysis. The Kaplan Meier Curve is an estimator used to estimate the survival function. The Kaplan Meier Curve is the visual representation of this function that shows the probability of an event at a respective time interval. The curve should approach the true survival function for the population under investigation, provided the sample size is large enough.
In this article, let’s see in detail what KM analysis is, how the Kaplan-Meier Curve is built, the math behind calculating the probabilities of survival. But before diving directly into the KM analysis, we shall have a quick and brief walk around on what is survival analysis and basic notations used in the analysis. Learning about Statistical Methods for Decision Making can help you understand the concepts more clearly.
The Kaplan-Meier curve is a graphical representation of the survival function. The curve is named after Edward Kaplan and Meier, who developed the technique in the 1950s. It is a non-parametric estimate of the survival function that does not make any assumptions about the underlying distribution of the data. The Kaplan-Meier curve is used to estimate the survival function from data that are censored, truncated, or have missing values. It shows the probability that a subject will survive up to time t. The curve is constructed by plotting the survival function against time.
Survival analysis is a statistical procedure for data analysis in which the outcome variable of interest is the time until an event occurs. The time can be any calendar time such as years, months, weeks or days from the beginning of follow-up until an event occurs. By event, we mean recovery, death, breakdown of a machine, wickets in an innings or any designated experience of interest that may happen to the case/subject.
Survival analysis has three goals to be addressed:
We hope you have got a picture of what survival analysis is and its goals. Next, we shall understand the notations used in the analysis and a basic interpretation of KM curve (a detailed explanation to be followed).
When using Kaplan Meier analysis, we should concentrate on three variables:
The serial time for the individual subjects should be arranged from the shortest to the longest, regardless of when they entered the study. The serial time duration of known survival is terminated by the event of interest. This is known as an interval. Only an occurrence of the event defines known survival time intervals. Whereas, censored subjects do not terminate the interval. Here, there is a possibility of two things to happen.
1. A subject can have the event of interest.
2. They are censored. As we discussed what event just above, this time we will define what censored data is.
Also Read: Multinomial Naive Bayes Explained
The straight definition of censored data is the information about a subject’s survival time is incomplete. This is a problem which most survival analyses suffer from. This can happen when something negative for the study happens, such as:
After so much theory and explanations on KM analysis, we shall move into the creation and interpretation of the KM curve.
For this, let’s consider an example where a drug is being tested on two groups of people (male and female). There are six subjects in each group (for ease of understanding). The serial time and the status at the serial time are given in the table below. Status at the serial time of 1 means the occurrence of an event, and 0 means, the subject is censored. The objective is to find the cumulative probability of survival and to find is there any significant difference in the drug between the groups.
Table 1 – Initial sorted table for KM analysis
As discussed earlier, the basic elements required for the analysis are 1. Serial time, 2. Status at the serial time and the group to which the subject belongs to. The data are entered in a table and is sorted by ascending serial times beginning with the shortest times for each group. Notice, each group has one censored subject. In a group which has male subjects, it is at the end of the trial, and in the other group, the subject was censored within the study timeline.
After constructing the table, we can use any statistical tools such as SPSS, Sigmaplot, R, Excel to plot the KM curve. First, let us see how to plot the KM curve and analyse the results with R software, then let’s have a quick walk around through the stats and calculation behind the computation of survival probabilities.
Step1: The packages used for the analysis are survival and survminer. Use install.packages( ) to install these libraries just in case if they are not pre installed in your R workspace.
Step2: The next step is to load the dataset and examine its structure. The data we will use for this analysis is the same as shown above. The data is saved as a csv file and the same is imported for the analysis in R.
Step 3: After this we are ready to create the survival object using the function Surv of the survival package. The object is stored in the surv_object as a destination. Survival object is basically a compiled version of the serial time and status. A + sign behind the survival time indicates censored data points.
Step 4: The next step is to fit the kaplan-Meier curves. For doing this we need to fit the survival function with the survival object and the group of interest. This fitting can be done using the survfit function of the survminer library. The survival object created in the previous step is given as a function of the group we have considered for the analysis.
The summary of the resulting fit_1 object shows, among other things, survival times, the proportion of surviving patients at every time point.
The table below is the table output of the survival analysis. It shows the time at which the event has taken place, number of subjects at risk after each event, cumulative survival probabilities, standard error associated with each probability and it’s upper and lower 95% confidence intervals for both the groups (the calculation behind the table and the stats are discussed later in this article).
Table 2 – survival analysis output
Step 5: After the above step it is now time to plot the KM curve. The corresponding survival curve can be examined by passing the survival object to the ggurvplot() function with pval = TRUE. This argument is very useful, because it plots the p-value of a log rank test as well, which will help us to get an idea if the groups are significantly different or not.
In table 2, it can be seen that the last subject of the female group has no cumulative probability of survival assigned to it, and there is a sudden drop in the probability for the third subject. Whereas in the other group, the last subject has a probability associated with it and the fall in probability is little lesser than the former group. It is because in the female group there is a subject that got censored in the middle (after the second event) and hence there is no subject left at the end to calculate the probability scores. It is because of that the probability has fallen steeply after the second event. In the case of the male group, the subject that got censored is only at the end, and hence the probability will not approach zero.
I know this is a little confusing, but worry not we will get it cleared in the coming pages.
Look at the KM curve in the figure. The survival duration of a subject is represented by the length of the horizontal lines along the X-axis of serial times. The occurrence of the event terminates the interval. The vertical lines are the event of interest happening, and the vertical distances between horizontals are important because they illustrate the change in the cumulative probability of surviving a given time as seen in the Y-axis. For example, if you belong to a group male, your probability of surviving 11 months is 100% ( x-axis in years); conversely, if you are in the other group, your probability of surviving the same time is slightly more than 66%. The steepness of the curve is determined by the survival durations.
Looking at the censored objects, the one subject that censored in group female materially reduced the cumulative survival between the intervals. Whereas, the terminally censored subject in the male group did not change the survival probability and the interval was not terminated by an event.
Table 3 – calculation table
The table above shows what happens behind the production of the KM curve. When the above table is cross-referenced with the KM curve, it is evident that intervals and the attendant probabilities are only constructed for events of interest and not for censored subjects. Because an event ends one interval and begins another interval, there should be more intervals than events.
The table explains the way the curves end. In group male, the curve ends without creating another interval below. The cumulative probability of surviving this long is determined by the last horizontal, sixth interval and is 0.166. In the other group, the curve drops to zero after the fifth interval to cause the sixth interval horizontal to be on the X-axis.
Looking at the probabilities of survival, it could be a little confusing that there are two probabilities 1. Cumulative probability 2. Interval probability. The cumulative probability defines the probability at the beginning and throughout the interval. This is graphed along the Y-axis of the curve. The interval survival rate defines the probability of surviving past the interval. i.e. still surviving after the interval and beginning the next.
Censoring affects survival rates. Censored observations that coincide with an event are usually considered to fall immediately after the event. Censoring removes the subject from the denominator, i.e., individuals still at risk. For example, in Group 2, there were three surviving intervals four and available to be at risk in interval five. However, during interval four one was censored; therefore, only two were left to be at risk in interval five, i.e. as seen in Table II the denominator went from four in interval four to two in interval five.
Quick check – Introduction to Data Science
Thus, we calculated the survival probabilities of each subject of two different groups. Though it seems like the male group has a greater probability of survival than the female group, the log-rank test’s p-value of 0.19 tells us that there is no significant difference between the groups. The null hypothesis is there is no difference, and the alternate hypothesis is the groups are significantly different. Since the p-value is greater than 0.05, we fail to reject the null hypothesis. This brings us to the end of the blog on the Kaplan Meier Curve. We hope you enjoyed it.
Original article source at: https://www.mygreatlearning.com
#Kaplan-Meier
1669082563
In this article, let's learn about What is the Kaplan-Meier curve? | Kaplan-Meier Curve Explained. In our everyday life, we come across many time-to-event examples. What does time-to-event mean? It is a course duration variable for each case/subject of interest having a beginning and an end anywhere along the timeline of the complete study. Some common examples are clinical study for a drug, wickets falling in an innings of a cricket match, overhauling a machine before it is decommissioned, etc., did you notice, there is something in common among the examples? Yes, it is the study of survival. One effective way to estimate the survival function is by using KM analysis. The Kaplan Meier Curve is an estimator used to estimate the survival function. The Kaplan Meier Curve is the visual representation of this function that shows the probability of an event at a respective time interval. The curve should approach the true survival function for the population under investigation, provided the sample size is large enough.
In this article, let’s see in detail what KM analysis is, how the Kaplan-Meier Curve is built, the math behind calculating the probabilities of survival. But before diving directly into the KM analysis, we shall have a quick and brief walk around on what is survival analysis and basic notations used in the analysis. Learning about Statistical Methods for Decision Making can help you understand the concepts more clearly.
The Kaplan-Meier curve is a graphical representation of the survival function. The curve is named after Edward Kaplan and Meier, who developed the technique in the 1950s. It is a non-parametric estimate of the survival function that does not make any assumptions about the underlying distribution of the data. The Kaplan-Meier curve is used to estimate the survival function from data that are censored, truncated, or have missing values. It shows the probability that a subject will survive up to time t. The curve is constructed by plotting the survival function against time.
Survival analysis is a statistical procedure for data analysis in which the outcome variable of interest is the time until an event occurs. The time can be any calendar time such as years, months, weeks or days from the beginning of follow-up until an event occurs. By event, we mean recovery, death, breakdown of a machine, wickets in an innings or any designated experience of interest that may happen to the case/subject.
Survival analysis has three goals to be addressed:
We hope you have got a picture of what survival analysis is and its goals. Next, we shall understand the notations used in the analysis and a basic interpretation of KM curve (a detailed explanation to be followed).
When using Kaplan Meier analysis, we should concentrate on three variables:
The serial time for the individual subjects should be arranged from the shortest to the longest, regardless of when they entered the study. The serial time duration of known survival is terminated by the event of interest. This is known as an interval. Only an occurrence of the event defines known survival time intervals. Whereas, censored subjects do not terminate the interval. Here, there is a possibility of two things to happen.
1. A subject can have the event of interest.
2. They are censored. As we discussed what event just above, this time we will define what censored data is.
Also Read: Multinomial Naive Bayes Explained
The straight definition of censored data is the information about a subject’s survival time is incomplete. This is a problem which most survival analyses suffer from. This can happen when something negative for the study happens, such as:
After so much theory and explanations on KM analysis, we shall move into the creation and interpretation of the KM curve.
For this, let’s consider an example where a drug is being tested on two groups of people (male and female). There are six subjects in each group (for ease of understanding). The serial time and the status at the serial time are given in the table below. Status at the serial time of 1 means the occurrence of an event, and 0 means, the subject is censored. The objective is to find the cumulative probability of survival and to find is there any significant difference in the drug between the groups.
Table 1 – Initial sorted table for KM analysis
As discussed earlier, the basic elements required for the analysis are 1. Serial time, 2. Status at the serial time and the group to which the subject belongs to. The data are entered in a table and is sorted by ascending serial times beginning with the shortest times for each group. Notice, each group has one censored subject. In a group which has male subjects, it is at the end of the trial, and in the other group, the subject was censored within the study timeline.
After constructing the table, we can use any statistical tools such as SPSS, Sigmaplot, R, Excel to plot the KM curve. First, let us see how to plot the KM curve and analyse the results with R software, then let’s have a quick walk around through the stats and calculation behind the computation of survival probabilities.
Step1: The packages used for the analysis are survival and survminer. Use install.packages( ) to install these libraries just in case if they are not pre installed in your R workspace.
Step2: The next step is to load the dataset and examine its structure. The data we will use for this analysis is the same as shown above. The data is saved as a csv file and the same is imported for the analysis in R.
Step 3: After this we are ready to create the survival object using the function Surv of the survival package. The object is stored in the surv_object as a destination. Survival object is basically a compiled version of the serial time and status. A + sign behind the survival time indicates censored data points.
Step 4: The next step is to fit the kaplan-Meier curves. For doing this we need to fit the survival function with the survival object and the group of interest. This fitting can be done using the survfit function of the survminer library. The survival object created in the previous step is given as a function of the group we have considered for the analysis.
The summary of the resulting fit_1 object shows, among other things, survival times, the proportion of surviving patients at every time point.
The table below is the table output of the survival analysis. It shows the time at which the event has taken place, number of subjects at risk after each event, cumulative survival probabilities, standard error associated with each probability and it’s upper and lower 95% confidence intervals for both the groups (the calculation behind the table and the stats are discussed later in this article).
Table 2 – survival analysis output
Step 5: After the above step it is now time to plot the KM curve. The corresponding survival curve can be examined by passing the survival object to the ggurvplot() function with pval = TRUE. This argument is very useful, because it plots the p-value of a log rank test as well, which will help us to get an idea if the groups are significantly different or not.
In table 2, it can be seen that the last subject of the female group has no cumulative probability of survival assigned to it, and there is a sudden drop in the probability for the third subject. Whereas in the other group, the last subject has a probability associated with it and the fall in probability is little lesser than the former group. It is because in the female group there is a subject that got censored in the middle (after the second event) and hence there is no subject left at the end to calculate the probability scores. It is because of that the probability has fallen steeply after the second event. In the case of the male group, the subject that got censored is only at the end, and hence the probability will not approach zero.
I know this is a little confusing, but worry not we will get it cleared in the coming pages.
Look at the KM curve in the figure. The survival duration of a subject is represented by the length of the horizontal lines along the X-axis of serial times. The occurrence of the event terminates the interval. The vertical lines are the event of interest happening, and the vertical distances between horizontals are important because they illustrate the change in the cumulative probability of surviving a given time as seen in the Y-axis. For example, if you belong to a group male, your probability of surviving 11 months is 100% ( x-axis in years); conversely, if you are in the other group, your probability of surviving the same time is slightly more than 66%. The steepness of the curve is determined by the survival durations.
Looking at the censored objects, the one subject that censored in group female materially reduced the cumulative survival between the intervals. Whereas, the terminally censored subject in the male group did not change the survival probability and the interval was not terminated by an event.
Table 3 – calculation table
The table above shows what happens behind the production of the KM curve. When the above table is cross-referenced with the KM curve, it is evident that intervals and the attendant probabilities are only constructed for events of interest and not for censored subjects. Because an event ends one interval and begins another interval, there should be more intervals than events.
The table explains the way the curves end. In group male, the curve ends without creating another interval below. The cumulative probability of surviving this long is determined by the last horizontal, sixth interval and is 0.166. In the other group, the curve drops to zero after the fifth interval to cause the sixth interval horizontal to be on the X-axis.
Looking at the probabilities of survival, it could be a little confusing that there are two probabilities 1. Cumulative probability 2. Interval probability. The cumulative probability defines the probability at the beginning and throughout the interval. This is graphed along the Y-axis of the curve. The interval survival rate defines the probability of surviving past the interval. i.e. still surviving after the interval and beginning the next.
Censoring affects survival rates. Censored observations that coincide with an event are usually considered to fall immediately after the event. Censoring removes the subject from the denominator, i.e., individuals still at risk. For example, in Group 2, there were three surviving intervals four and available to be at risk in interval five. However, during interval four one was censored; therefore, only two were left to be at risk in interval five, i.e. as seen in Table II the denominator went from four in interval four to two in interval five.
Quick check – Introduction to Data Science
Thus, we calculated the survival probabilities of each subject of two different groups. Though it seems like the male group has a greater probability of survival than the female group, the log-rank test’s p-value of 0.19 tells us that there is no significant difference between the groups. The null hypothesis is there is no difference, and the alternate hypothesis is the groups are significantly different. Since the p-value is greater than 0.05, we fail to reject the null hypothesis. This brings us to the end of the blog on the Kaplan Meier Curve. We hope you enjoyed it.
Original article source at: https://www.mygreatlearning.com
#Kaplan-Meier
1656182700
Permissionless deployment of Curve metapools.
The metapool factory has several core components:
Factory
is the main contract used to deploy new metapools. It also acts a registry for finding the deployed pools and querying information about them.See the documentation for more detailed information.
Testing is performed in a forked mainnet environment.
To run the unit tests:
brownie test
You can optionally include the following flags:
--decimals
: The number of decimal places for the token used to test the factory pool. Default is 18.--return_value
: The return value given by the token used to test the factory pool. Valid options are True
and None
.To deploy the contracts, first modify the deployment script
to unlock the account you wish to deploy from. Then:
brownie run deploy --network mainnet
Download Details:
Author: curvefi
Source Code: https://github.com/curvefi/curve-factory
License: View license
#curve #dao #blockchain #smartcontract #python
1656167880
The Curve DAO Token is the utility token of the Curve Finance ecosystem, primarily created to incentivise liquidity providers on the platform. However, when staked the CRV token becomes veCRV, a non-transferable, linearly decaying, governance token with 2 special abilities. The first is holders of veCRV receive boosted rewards on provided liquidity (up to a maximum of 2.5x), and the second is holders receive 50% of all the trading fees collected across all Curve Finance pools (including factory pools + altchains) in the form of 3CRV.
With the release of the Curve veBoost, veCRV holders can now delegate a fixed allotment of their veCRV boost to third-party accounts in the form of an NFT. These veBoost tokens, are wrapped packages of boost which eligible gauges (currently only factory gauges) use to determine boosted rewards on provided liquidity. This means any account can receive boosted rewards on provided liquidity, without holding veCRV.
For eligible gauges, the equation for determining your adjusted veCRV balance is:
Adjusted veCRV Balance = veCRV Balance - delegated veBoost + received veBoost
Restrictions:
1 Week
veCRV balance - any outstanding veBoosts
abs(sum(delegated veboost))
, meaning an accounts veCRV balance can never get inflated due to negative outstanding boostsmax(received veboost, 0)
, meaning an accounts received boost balance will never result in decreasing their vanilla veCRV balanceThe adjusted formula on chain therefore is:
Adjusted veCRV Balance = veCRV Balance - abs(delegated veBoost) + max(received veBoost, 0)
In fact, if an account does not participate in delegating veBoost, their adjusted veCRV balance will never be below their vanilla veCRV balance.
Also check the requirements.txt
Testing is performed in a local development environment
To run the unit tests:
brownie test --stateful false
To run the state tests:
brownie test --stateful true
To deploy the contracts, first modify the deployment script
to unlock the account you wish to deploy from. Then:
brownie run deploy --network mainnet
Download Details:
Author: curvefi
Source Code: https://github.com/curvefi/curve-veBoost
License: View license
#curve #dao #blockchain #smartcontract #python
1656139680
Vyper contracts used in the Curve Governance DAO.
Curve DAO consists of multiple smart contracts connected by Aragon. Interaction with Aragon occurs through a modified implementation of the Aragon Voting App. Aragon's standard one token, one vote method is replaced with a weighting system based on locking tokens. Curve DAO has a token (CRV) which is used for both governance and value accrual.
View the documentation for a more in-depth explanation of how Curve DAO works.
To get started, first create and initialize a Python virtual environment. Next, clone the repo and install the developer dependencies:
git clone https://github.com/curvefi/curve-dao-contracts.git
cd curve-dao-contracts
pip install -r requirements.txt
The test suite is split between unit and integration tests. To run the entire suite:
brownie test
To run only the unit tests or integration tests:
brownie test tests/unitary
brownie test tests/integration
See the deployment documentation for detailed information on how to deploy Curve DAO.
Curve DAO contracts have been audited by Trail of Bits and Quantstamp. These audit reports are made available on the Curve website.
There is also an active bug bounty for issues which can lead to substantial loss of money, critical bugs such as a broken live-ness condition, or irreversible loss of funds.
You may find the following guides useful:
If you have any questions about this project, or wish to engage with us:
Download Details:
Author: curvefi
Source Code: https://github.com/curvefi/curve-dao-contracts
License: MIT license
#curve #defi #blockchain #smartcontract #python
1656160620
Vyper contracts used in Curve exchange pools.
Curve is an exchange liquidity pool on Ethereum designed for extremely efficient stablecoin trading and low risk, supplemental fee income for liquidity providers, without an opportunity cost.
Curve allows users to trade between correlated cryptocurrencies with a bespoke low slippage, low fee algorithm. The liquidity pool is also supplied to lending protocol where it generates additional income for liquidity providers.
Curve contracts are compiled using Vyper, however installation of the required Vyper versions is handled by Brownie.
To get started, first create and initialize a Python virtual environment. Next, clone the repo and install the developer dependencies:
git clone https://github.com/curvefi/curve-contract.git
cd curve-contract
pip install -r requirements.txt
contracts/pool-templates
contracts/pools
See the documentation within contracts
and it's subdirectories for more detailed information on how to get started developing on Curve.
The test suite contains common tests for all Curve pools, as well as unique per-pool tests. To run the entire suite:
brownie test
To run tests on a specific pool:
brownie test tests/ --pool <POOL NAME>
Valid pool names are the names of the subdirectories within contracts/pools
. For templates, prepend template-
to the subdirectory names within contracts/pool-templates
. For example, the base template is template-base
.
You can optionally include the --coverage
flag to view a coverage report upon completion of the tests.
To deploy a new pool:
pooldata.json
for the pool you are deploying contains all the necessary fields.scripts/deploy.py
.brownie run deploy --network mainnet-fork -I
When the script completes it will open a console. You should call the various getter methods on the deployed contracts to ensure the pool has been configured correctly.
4. Deploy the pool to the mainnet.
brownie run deploy --network mainnet
Be sure to open a pull request that adds the deployment addresses to the pool README.md
.
Curve smart contracts have been audited by Trail of Bits. These audit reports are made available on the Curve website.
There is also an active bug bounty for issues which can lead to substantial loss of money, critical bugs such as a broken live-ness condition, or irreversible loss of funds.
Download Details:
Author: curvefi
Source Code: https://github.com/curvefi/curve-contract
License: View license
#curve #dao #blockchain #smartcontract