Rupert  Beatty

Rupert Beatty

1677577860

Sentry-cocoa: The official Sentry SDK for iOS, tvOS, macOS, watchOS

Sentry 
 

Official Sentry SDK for iOS / tvOS / macOS / watchOS (1).

Bad software is everywhere, and we're tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology. If you want to join us Check out our open positions

This SDK is written in Objective-C but also provides a nice Swift interface.

Where is the master branch?

We renamed the default branch from master to main.

Initialization

Remember to call this as early in your application life cycle as possible Ideally in applicationDidFinishLaunching in AppDelegate

import Sentry

// ....

SentrySDK.start { options in
    options.dsn = "___PUBLIC_DSN___"
    options.debug = true // Helpful to see what's going on
}    
@import Sentry;

// ....

[SentrySDK startWithConfigureOptions:^(SentryOptions *options) {
    options.dsn = @"___PUBLIC_DSN___";
    options.debug = @YES; // Helpful to see what's going on
}];

For more information checkout the docs.

(1)limited symbolication support and no crash handling.

Blog posts

Mobile Vitals - Four Metrics Every Mobile Developer Should Care About.

How to use Sentry Attachments with Mobile Applications.

Close the Loop with User Feedback.

A Sanity Listicle for Mobile Developers.


Download Details:

Author: Getsentry
Source Code: https://github.com/getsentry/sentry-cocoa 
License: MIT license

#swift #macos #ios #report #error #objective-c 

Sentry-cocoa: The official Sentry SDK for iOS, tvOS, macOS, watchOS
Nat  Grady

Nat Grady

1667429460

Report: Automated Reporting Of Objects in R

Report 

“From R to your manuscript”

report’s primary goal is to bridge the gap between R’s output and the formatted results contained in your manuscript. It automatically produces reports of models and data frames according to best practices guidelines (e.g., APA’s style), ensuring standardization and quality in results reporting.

library(report)

model <- lm(Sepal.Length ~ Species, data = iris)
report(model)
# We fitted a linear model (estimated using OLS) to predict Sepal.Length with
# Species (formula: Sepal.Length ~ Species). The model explains a statistically
# significant and substantial proportion of variance (R2 = 0.62, F(2, 147) =
# 119.26, p < .001, adj. R2 = 0.61). The model's intercept, corresponding to
# Species = setosa, is at 5.01 (95% CI [4.86, 5.15], t(147) = 68.76, p < .001).
# Within this model:
# 
#   - The effect of Species [versicolor] is statistically significant and positive
# (beta = 0.93, 95% CI [0.73, 1.13], t(147) = 9.03, p < .001; Std. beta = 1.12,
# 95% CI [0.88, 1.37])
#   - The effect of Species [virginica] is statistically significant and positive
# (beta = 1.58, 95% CI [1.38, 1.79], t(147) = 15.37, p < .001; Std. beta = 1.91,
# 95% CI [1.66, 2.16])
# 
# Standardized parameters were obtained by fitting the model on a standardized
# version of the dataset. 95% Confidence Intervals (CIs) and p-values were
# computed using a Wald t-distribution approximation.

Installation

The package is available on CRAN and can be downloaded by running:

install.packages("report")

If you would instead like to experiment with the development version, you can download it from GitHub:

install.packages("remotes")
remotes::install_github("easystats/report") # You only need to do that once

Load the package every time you start R

library("report")

Tip

Instead of library(datawizard), use library(easystats). This will make all features of the easystats-ecosystem available.

To stay updated, use easystats::install_latest().

Documentation

The package documentation can be found here.

Report all the things

General Workflow

The report package works in a two step fashion. First, you create a report object with the report() function. Then, this report object can be displayed either textually (the default output) or as a table, using as.data.frame(). Moreover, you can also access a more digest and compact version of the report using summary() on the report object.

workflow

The report() function works on a variety of models, as well as other objects such as dataframes:

report(iris)
# The data contains 150 observations of the following 5 variables:
# 
#   - Sepal.Length: n = 150, Mean = 5.84, SD = 0.83, Median = 5.80, MAD = 1.04,
# range: [4.30, 7.90], Skewness = 0.31, Kurtosis = -0.55, 0% missing
#   - Sepal.Width: n = 150, Mean = 3.06, SD = 0.44, Median = 3.00, MAD = 0.44,
# range: [2, 4.40], Skewness = 0.32, Kurtosis = 0.23, 0% missing
#   - Petal.Length: n = 150, Mean = 3.76, SD = 1.77, Median = 4.35, MAD = 1.85,
# range: [1, 6.90], Skewness = -0.27, Kurtosis = -1.40, 0% missing
#   - Petal.Width: n = 150, Mean = 1.20, SD = 0.76, Median = 1.30, MAD = 1.04,
# range: [0.10, 2.50], Skewness = -0.10, Kurtosis = -1.34, 0% missing
#   - Species: 3 levels, namely setosa (n = 50, 33.33%), versicolor (n = 50,
# 33.33%) and virginica (n = 50, 33.33%)

These reports nicely work within the tidyverse workflow:

iris %>%
  select(-starts_with("Sepal")) %>%
  group_by(Species) %>%
  report() %>%
  summary()
# The data contains 150 observations, grouped by Species, of the following 3
# variables:
# 
# - setosa (n = 50):
#   - Petal.Length: Mean = 1.46, SD = 0.17, range: [1, 1.90]
#   - Petal.Width: Mean = 0.25, SD = 0.11, range: [0.10, 0.60]
# 
# - versicolor (n = 50):
#   - Petal.Length: Mean = 4.26, SD = 0.47, range: [3, 5.10]
#   - Petal.Width: Mean = 1.33, SD = 0.20, range: [1, 1.80]
# 
# - virginica (n = 50):
#   - Petal.Length: Mean = 5.55, SD = 0.55, range: [4.50, 6.90]
#   - Petal.Width: Mean = 2.03, SD = 0.27, range: [1.40, 2.50]

t-tests and correlations

Reports can be used to automatically format tests like t-tests or correlations.

report(t.test(mtcars$mpg ~ mtcars$am))
# Effect sizes were labelled following Cohen's (1988) recommendations.
# 
# The Welch Two Sample t-test testing the difference of mtcars$mpg by mtcars$am
# (mean in group 0 = 17.15, mean in group 1 = 24.39) suggests that the effect is
# negative, statistically significant, and large (difference = -7.24, 95% CI
# [-11.28, -3.21], t(18.33) = -3.77, p = 0.001; Cohen's d = -1.41, 95% CI [-2.26,
# -0.53])

As mentioned, you can also create tables with the as.data.frame() functions, like for example with this correlation test:

cor.test(iris$Sepal.Length, iris$Sepal.Width) %>%
  report() %>%
  as.data.frame()
# Pearson's product-moment correlation
# 
# Parameter1        |       Parameter2 |     r |        95% CI | t(148) |     p
# -----------------------------------------------------------------------------
# iris$Sepal.Length | iris$Sepal.Width | -0.12 | [-0.27, 0.04] |  -1.44 | 0.152
# 
# Alternative hypothesis: two.sided

ANOVAs

This works great with ANOVAs, as it includes effect sizes and their interpretation.

aov(Sepal.Length ~ Species, data = iris) %>%
  report()
# The ANOVA (formula: Sepal.Length ~ Species) suggests that:
# 
#   - The main effect of Species is statistically significant and large (F(2, 147)
# = 119.26, p < .001; Eta2 = 0.62, 95% CI [0.54, 1.00])
# 
# Effect sizes were labelled following Field's (2013) recommendations.

General Linear Models (GLMs)

Reports are also compatible with GLMs, such as this logistic regression:

model <- glm(vs ~ mpg * drat, data = mtcars, family = "binomial")

report(model)
# We fitted a logistic model (estimated using ML) to predict vs with mpg and drat
# (formula: vs ~ mpg * drat). The model's explanatory power is substantial
# (Tjur's R2 = 0.51). The model's intercept, corresponding to mpg = 0 and drat =
# 0, is at -33.43 (95% CI [-77.90, 3.25], p = 0.083). Within this model:
# 
#   - The effect of mpg is statistically non-significant and positive (beta = 1.79,
# 95% CI [-0.10, 4.05], p = 0.066; Std. beta = 3.63, 95% CI [1.36, 7.50])
#   - The effect of drat is statistically non-significant and positive (beta =
# 5.96, 95% CI [-3.75, 16.26], p = 0.205; Std. beta = -0.36, 95% CI [-1.96,
# 0.98])
#   - The interaction effect of drat on mpg is statistically non-significant and
# negative (beta = -0.33, 95% CI [-0.83, 0.15], p = 0.141; Std. beta = -1.07, 95%
# CI [-2.66, 0.48])
# 
# Standardized parameters were obtained by fitting the model on a standardized
# version of the dataset. 95% Confidence Intervals (CIs) and p-values were
# computed using a Wald z-distribution approximation.

Mixed Models

Mixed models, whose popularity and usage is exploding, can also be reported:

library(lme4)

model <- lme4::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris)

report(model)
# We fitted a linear mixed model (estimated using REML and nloptwrap optimizer)
# to predict Sepal.Length with Petal.Length (formula: Sepal.Length ~
# Petal.Length). The model included Species as random effect (formula: ~1 |
# Species). The model's total explanatory power is substantial (conditional R2 =
# 0.97) and the part related to the fixed effects alone (marginal R2) is of 0.66.
# The model's intercept, corresponding to Petal.Length = 0, is at 2.50 (95% CI
# [1.19, 3.82], t(146) = 3.75, p < .001). Within this model:
# 
#   - The effect of Petal Length is statistically significant and positive (beta =
# 0.89, 95% CI [0.76, 1.01], t(146) = 13.93, p < .001; Std. beta = 1.89, 95% CI
# [1.63, 2.16])
# 
# Standardized parameters were obtained by fitting the model on a standardized
# version of the dataset. 95% Confidence Intervals (CIs) and p-values were
# computed using a Wald t-distribution approximation.

Bayesian Models

Bayesian models can also be reported using the new SEXIT framework, which combines clarity, precision and usefulness.

library(rstanarm)

model <- stan_glm(mpg ~ qsec + wt, data = mtcars)

report(model)
# We fitted a Bayesian linear model (estimated using MCMC sampling with 4 chains
# of 1000 iterations and a warmup of 500) to predict mpg with qsec and wt
# (formula: mpg ~ qsec + wt). Priors over parameters were set as normal (mean =
# 0.00, SD = 8.43) distributions. The model's explanatory power is substantial
# (R2 = 0.81, 95% CI [0.70, 0.90], adj. R2 = 0.79). The model's intercept,
# corresponding to qsec = 0 and wt = 0, is at 19.72 (95% CI [9.18, 29.63]).
# Within this model:
# 
#   - The effect of qsec (Median = 0.92, 95% CI [0.42, 1.46]) has a 99.90%
# probability of being positive (> 0), 99.00% of being significant (> 0.30), and
# 0.15% of being large (> 1.81). The estimation successfully converged (Rhat =
# 1.000) and the indices are reliable (ESS = 2411)
#   - The effect of wt (Median = -5.04, 95% CI [-6.00, -4.02]) has a 100.00%
# probability of being negative (< 0), 100.00% of being significant (< -0.30),
# and 100.00% of being large (< -1.81). The estimation successfully converged
# (Rhat = 1.000) and the indices are reliable (ESS = 2582)
# 
# Following the Sequential Effect eXistence and sIgnificance Testing (SEXIT)
# framework, we report the median of the posterior distribution and its 95% CI
# (Highest Density Interval), along the probability of direction (pd), the
# probability of significance and the probability of being large. The thresholds
# beyond which the effect is considered as significant (i.e., non-negligible) and
# large are |0.30| and |1.81| (corresponding respectively to 0.05 and 0.30 of the
# outcome's SD). Convergence and stability of the Bayesian sampling has been
# assessed using R-hat, which should be below 1.01 (Vehtari et al., 2019), and
# Effective Sample Size (ESS), which should be greater than 1000 (Burkner, 2017).
# and We fitted a Bayesian linear model (estimated using MCMC sampling with 4
# chains of 1000 iterations and a warmup of 500) to predict mpg with qsec and wt
# (formula: mpg ~ qsec + wt). Priors over parameters were set as normal (mean =
# 0.00, SD = 15.40) distributions. The model's explanatory power is substantial
# (R2 = 0.81, 95% CI [0.70, 0.90], adj. R2 = 0.79). The model's intercept,
# corresponding to qsec = 0 and wt = 0, is at 19.72 (95% CI [9.18, 29.63]).
# Within this model:
# 
#   - The effect of qsec (Median = 0.92, 95% CI [0.42, 1.46]) has a 99.90%
# probability of being positive (> 0), 99.00% of being significant (> 0.30), and
# 0.15% of being large (> 1.81). The estimation successfully converged (Rhat =
# 1.000) and the indices are reliable (ESS = 2411)
#   - The effect of wt (Median = -5.04, 95% CI [-6.00, -4.02]) has a 100.00%
# probability of being negative (< 0), 100.00% of being significant (< -0.30),
# and 100.00% of being large (< -1.81). The estimation successfully converged
# (Rhat = 1.000) and the indices are reliable (ESS = 2582)
# 
# Following the Sequential Effect eXistence and sIgnificance Testing (SEXIT)
# framework, we report the median of the posterior distribution and its 95% CI
# (Highest Density Interval), along the probability of direction (pd), the
# probability of significance and the probability of being large. The thresholds
# beyond which the effect is considered as significant (i.e., non-negligible) and
# large are |0.30| and |1.81| (corresponding respectively to 0.05 and 0.30 of the
# outcome's SD). Convergence and stability of the Bayesian sampling has been
# assessed using R-hat, which should be below 1.01 (Vehtari et al., 2019), and
# Effective Sample Size (ESS), which should be greater than 1000 (Burkner, 2017).

Other types of reports

Specific parts

One can, for complex reports, directly access the pieces of the reports:

model <- lm(Sepal.Length ~ Species, data = iris)

report_model(model)
report_performance(model)
report_statistics(model)
# linear model (estimated using OLS) to predict Sepal.Length with Species (formula: Sepal.Length ~ Species)
# The model explains a statistically significant and substantial proportion of
# variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61)
# beta = 5.01, 95% CI [4.86, 5.15], t(147) = 68.76, p < .001; Std. beta = -1.01, 95% CI [-1.18, -0.84]
# beta = 0.93, 95% CI [0.73, 1.13], t(147) = 9.03, p < .001; Std. beta = 1.12, 95% CI [0.88, 1.37]
# beta = 1.58, 95% CI [1.38, 1.79], t(147) = 15.37, p < .001; Std. beta = 1.91, 95% CI [1.66, 2.16]

Report participants’ details

This can be useful to complete the Participants paragraph of your manuscript.

data <- data.frame(
  "Age" = c(22, 23, 54, 21),
  "Sex" = c("F", "F", "M", "M")
)

paste(
  report_participants(data, spell_n = TRUE),
  "were recruited in the study by means of torture and coercion."
)
# [1] "Four participants (Mean age = 30.0, SD = 16.0, range: [21, 54]; Sex: 50.0% females, 50.0% males, 0.0% other) were recruited in the study by means of torture and coercion."

Report sample

Report can also help you create a sample description table (also referred to as Table 1).

Variablesetosa (n=50)versicolor (n=50)virginica (n=50)Total (n=150)
Mean Sepal.Length (SD)5.01 (0.35)5.94 (0.52)6.59 (0.64)5.84 (0.83)
Mean Sepal.Width (SD)3.43 (0.38)2.77 (0.31)2.97 (0.32)3.06 (0.44)
Mean Petal.Length (SD)1.46 (0.17)4.26 (0.47)5.55 (0.55)3.76 (1.77)
Mean Petal.Width (SD)0.25 (0.11)1.33 (0.20)2.03 (0.27)1.20 (0.76)

Report system and packages

Finally, report includes some functions to help you write the data analysis paragraph about the tools used.

report(sessionInfo())
# Analyses were conducted using the R Statistical language (version 4.2.1; R Core
# Team, 2022) on macOS Monterey 12.6, using the packages lme4 (version 1.1.30;
# Bates D et al., 2015), Matrix (version 1.5.1; Bates D et al., 2022), Rcpp
# (version 1.0.9; Eddelbuettel D, François R, 2011), rstanarm (version 2.21.3;
# Goodrich B et al., 2022), report (version 0.5.5.2; Makowski D et al., 2021) and
# dplyr (version 1.0.10; Wickham H et al., 2022).
# 
# References
# ----------
#   - Bates D, Mächler M, Bolker B, Walker S (2015). "Fitting LinearMixed-Effects
# Models Using lme4." _Journal of Statistical Software_,*67*(1), 1-48.
# doi:10.18637/jss.v067.i01<https://doi.org/10.18637/jss.v067.i01>.
#   - Bates D, Maechler M, Jagan M (2022). _Matrix: Sparse and Dense MatrixClasses
# and Methods_. R package version
# 1.5-1,<https://CRAN.R-project.org/package=Matrix>.
#   - Eddelbuettel D, François R (2011). "Rcpp: Seamless R and C++Integration."
# _Journal of Statistical Software_, *40*(8), 1-18.doi:10.18637/jss.v040.i08
# <https://doi.org/10.18637/jss.v040.i08>.Eddelbuettel D (2013). _Seamless R and
# C++ Integration with Rcpp_.Springer, New York.
# doi:10.1007/978-1-4614-6868-4<https://doi.org/10.1007/978-1-4614-6868-4>, ISBN
# 978-1-4614-6867-7.Eddelbuettel D, Balamuta JJ (2018). "Extending extitR with
# extitC++: ABrief Introduction to extitRcpp." _The American Statistician_,
# *72*(1),28-36.
# doi:10.1080/00031305.2017.1375990<https://doi.org/10.1080/00031305.2017.1375990>.
#   - Goodrich B, Gabry J, Ali I, Brilleman S (2022). "rstanarm: Bayesianapplied
# regression modeling via Stan." R package version
# 2.21.3,<https://mc-stan.org/rstanarm/>.Brilleman S, Crowther M, Moreno-Betancur
# M, Buros Novik J, Wolfe R(2018). "Joint longitudinal and time-to-event models
# via Stan." StanCon2018. 10-12 Jan 2018. Pacific Grove, CA,
# USA.,<https://github.com/stan-dev/stancon_talks/>.
#   - Makowski D, Ben-Shachar M, Patil I, Lüdecke D (2021). "AutomatedResults
# Reporting as a Practical Tool to Improve Reproducibility andMethodological Best
# Practices Adoption." _CRAN_.<https://github.com/easystats/report>.
#   - R Core Team (2022). _R: A Language and Environment for StatisticalComputing_.
# R Foundation for Statistical Computing, Vienna,
# Austria.<https://www.R-project.org/>.
#   - Wickham H, François R, Henry L, Müller K (2022). _dplyr: A Grammar ofData
# Manipulation_. R package version
# 1.0.10,<https://CRAN.R-project.org/package=dplyr>.

Credits

If you like it, you can put a star on this repo, and cite the package as follows:

citation("report")

To cite in publications use:

  Makowski, D., Ben-Shachar, M.S., Patil, I. & Lüdecke, D. (2020).
  Automated Results Reporting as a Practical Tool to Improve
  Reproducibility and Methodological Best Practices Adoption. CRAN.
  Available from https://github.com/easystats/report. doi: .

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption},
    author = {Dominique Makowski and Mattan S. Ben-Shachar and Indrajeet Patil and Daniel Lüdecke},
    year = {2021},
    journal = {CRAN},
    url = {https://github.com/easystats/report},
  }

Contribute

report is a young package in need of affection. You can easily be a part of the developing community of this open-source software and improve science! Don’t be shy, try to code and submit a pull request (See the contributing guide). Even if it’s not perfect, we will help you make it great!

Code of Conduct

Please note that the report project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Download Details:

Author: Easystats
Source Code: https://github.com/easystats/report 
License: GPL-3.0 license

#r #report #models #rstats 

Report: Automated Reporting Of Objects in R

Opensourcepos: Open Source Point Of Sale

Open Source Point of Sale

👋 Introduction

Open Source Point of Sale is a web-based point of sale system. The application is written in PHP, it uses MySQL (or MariaDB) as data storage back-end and has a simple but intuitive user interface.

The latest 3.x version is a complete overhaul of the original software. It uses CodeIgniter 3 as a framework and is based on Bootstrap 3 using Bootswatch themes. Along with improved functionality and security.

The features include:

  • Stock management (items and kits with an extensible list of attributes)
  • VAT, GST, customer, and multi tiers taxation
  • Sale register with transactions logging
  • Quotation and invoicing
  • Expenses logging
  • Cash up function
  • Printing and emailing of receipts, invoices and quotations
  • Barcode generation and printing
  • Database of customers and suppliers
  • Multiuser with permission control
  • Reporting on sales, orders, expenses, inventory status and more
  • Receivings
  • Gift cards
  • Rewards
  • Restaurant tables
  • Messaging (SMS)
  • Multilanguage
  • Selectable Bootstrap based UI theme with Bootswatch
  • Mailchimp integration
  • Optional Google reCAPTCHA to protect login page from brute force attacks
  • GDPR ready

🧪 Live Demo

We've got a live version of our latest master running for you to play around with and test everything out. It's a containerized install that will reinitialize when new functionality is merged into our code repository.

You can find the demo here and log in with these credentials.
👤 Username admin
🔒 Password pointofsale

If you bump into an issue, please check the status page here to confirm if the server is up and running.

🖥️ Development Demo

Besides the demo of the latest master, we also have a development server that builds when there's a new commit to our repository. It's mainly used for testing out new code before merging it into the master. It can be found here.

The log in credentials are the same as the regular live demo.

💾 Installation

Please refrain from creating issues about installation problems before having read the FAQ and going through existing GitHub issues. We have a build pipeline that checks the sanity of our latest repository commit, and in case the application itself is broken then our build will be as well.

This application can be set up in many different ways and we only support the ones described in the INSTALL.md file.

For more information and recommendations on support hardware, like receipt printers and barcode scanners, read this page on our wiki.

✨ Contributing

Everyone is more than welcome to help us improve this project. If you think you've got something to help us go forward, feel free to open a pull request.

Want to help translate Open Source Point of Sale in your language? You can find our Weblate here, sign up, and start translating. You can subscribe to different languages to receive a notification once a new string is added or needs updating. Have a look at our guidelines below to help you get started.

Only with the help of the community, we can keep language translations up to date. Thanks!

🐛 Reporting Bugs

Before creating a new issue, you'll need copy and include the info under the System Info tab in the configuration section in most cases. If that information is not provided in full, your issue might be tagged as pending.

If you're reporting a potential security issue, please refer to our security policy found in the SECURITY.md file.

NOTE: If you're running non-release code, please make sure you always run the latest database upgrade script and you download the latest master code.

📖 FAQ

If you get the message system folder missing, then you have cloned the source using git and you need to run a build first. Check INSTALL.md for instructions or download latest zip file from GitHub releases instead.

If at login time you read The installation is not correct, check your php.ini file., please check the error_log in public folder to understand what's wrong and make sure you read the INSTALL.md. To know how to enable error_log, please read the comment in issue #1770.

If you installed your OSPOS under a web server subdir, please edit public/.htaccess and go to the lines with the comments if in web root or if in subdir, uncomment one and replace <OSPOS path> with your path and follow the instruction on the second comment line. If you face more issues, please read issue #920 for more information.

Apache server configurations are SysAdmin issues and not strictly related to OSPOS. Please make sure you can show a "Hello world" HTML page before pointing to OSPOS public directory. Make sure .htaccess is correctly configured.

If the avatar pictures are not shown in items or at item save you get an error, please make sure your public and subdirs are assigned to the correct owner and the access permission is set to 750.

If you install OSPOS in Docker behind a proxy that performs ssloffloading, you can enable the URL generated to be HTTPS instead of HTTP, by activating the environment variable FORCE_HTTPS = 1.

If you have suhosin installed and face an issue with CSRF, please make sure you read issue #1492.

PHP 8.0 is not currently supported, see issue #3051.

PHP 5.5 and 5.6 are no longer supported due to the fact that they have been deprecated and not safe to use from security point of view.

🏃 Keep the Machine Running

If you like our project, please consider buying us a coffee through the button below so we can keep adding features.

Donate
Or refer to the FUNDING.yml file.

If you choose to deploy OSPOS in the cloud, you can contribute to the project by using DigitalOcean and signing up through our referral link. You'll receive a free $200, 60-day credit if you run OSPOS in a DigitalOcean droplet through our referral link.

📄 License

Open Source Point of Sale is licensed under MIT terms with an important addition:

The footer signature "© 2010 - current year · opensourcepos.org · 3.x.x - hash" including the version, hash and link our website MUST BE RETAINED, MUST BE VISIBLE IN EVERY PAGE and CANNOT BE MODIFIED.

Also worth noting:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

For more details please read the LICENSE file.

It's important to understand that although you are free to use the application the copyright has to stay and the license agreement applies in all cases. Therefore any actions like:

  • Removing LICENSE and/or any license files is prohibited
  • Authoring the footer notice replacing it with your own or even worse claiming the copyright is absolutely prohibited
  • Claiming full ownership of the code is prohibited

In short, you are free to use the application but you cannot claim any property on it.

Any person or company found breaching the license agreement might find a bunch of monkeys at the door ready to destroy their servers.

Download Details:

Author: opensourcepos
Source Code: https://github.com/opensourcepos/opensourcepos 
License: View license

#php #report #bootstrap 

Opensourcepos: Open Source Point Of Sale

Report.jl: A Markdown Report Writer for Julia

Report.jl

Lightweight Markdown report generator for Julia.

The very general idea is that you can create markdown-formatted reports from within Julia code. Potentially helpful when running a data analysis pipeline that creates tables and plots as output. Uses pandoc Markdown and some of its extensions.

Some examples:

using Report
# create a Markdown document
doc = Report.Markdown("Report.md", "w", "figures")

# add a header to the document 
write(doc, Report.Header(1, "Report on Report.jl"))

# do some stuff, read in data, plot something
# Table(nrows, ncolumns, header, data, caption) creates a simple_table
write(doc, Report.Table(6, 3, ["Col1","Col2","Col3"], data, "Example table"))

# add a plot that was stored in `filename`
write(doc, Report.Figure(filename, "Yet another plot"))

# add some julia code to help you remember what you have done (uses fenced_code_blocks)

code = """
doc = Report.Markdown("Report.md", "w", "figures")
write(doc, Report.Header(1, "Report on Report.jl"))
write(doc, Report.Table(6, 3, ["Col1","Col2","Col3"], data, "Example table"))
write(doc, Report.Figure(filename, "Yet another plot"))
"""

write(doc, Report.Code("julia", code))

Download Details:

Author: Sveme
Source Code: https://github.com/sveme/Report.jl 
License: View license

#julia #markdown #report 

Report.jl: A Markdown Report Writer for Julia
Dexter  Goodwin

Dexter Goodwin

1661172360

Ackee-report: CLI tool To Generate Performance Reports Of Websites

ackee-report

CLI tool to generate performance reports of websites using the self-hosted analytics tool Ackee.

preview

👋 Introduction

ackee-report lets you create monthly website performance reports using your Ackee analytics data and either send them via email, generate a RSS feed or output it to a JSON file. It uses Ackee's Graphql API and can be configured to generate multiple reports for different websites, data ranges and recipients.

🚀 Get started

Install ackee-report globally via npm:

npm install ackee-report -g

ackee-report@latest requires Ackee >=v3.1.1. Use ackee-report@v1.1.3 for older versions of Ackee

After that ackee-report is ready to be used 🎉

⚙️ Configuration

On the first run ackee-report will ask you to input a few values:

  • ackee server / ACKEE_SERVER - the endpoint of your Ackee instance
  • ackee token / ACKEE_TOKEN - a permanent Ackee token (can be used instead of username and password, more info)
  • ackee username ACKEE_USERNAME - your Ackee username (more info)
  • ackee password ACKEE_PASSWORD - your Ackee password (more info)
  • email host / EMAIL_HOST - the domain of the email server (more info)
  • email port / EMAIL_PORT - the port of the email server (more info)
  • email username / EMAIL_USERNAME - the username to use with the email server (more info)
  • email password / EMAIL_PASSWORD - the password to use with the email server (more info)
  • email from / EMAIL_FROM - the from address to use (more info)

The configuration will be stored in your home directory under ~/.config/configstore/ackee-report.json and can be changed at any point.

Environment Variables

If you don't want to interact with ackee-report via the CLI interface, you can also specify each configuration option as an environment variable e.g. ACKEE_TOKEN=<token>

Ackee API authentication

ackee-report needs access to your Ackee instance via the API in order to get all the data it needs to generate the report. You can choose any of the two authentication methods below:

Username and password:

Enter your username and password you use to login to the Ackee UI on the first run of ackee-report or change it in the config file later.

ackee-report will then use them to create a temporary access token each time it runs and use it to query the Ackee API.

Permanent access token (recommended):

The recommended way of authenticating ackee-report is with a permanent access token (only available with Ackee v2.2.0 or later).

You will have to create one via the Ackee UI under Settings/Permanent Tokens, then click on New permanent token and give it a name (doesn't matter what).

Copy the permanent token id and enter it on the first run of ackee-report or add it to the config file under ackee.token later.

The same token will then be used each time ackee-report runs to query the Ackee API.

Email setup

If you want to send your report via email, you have to specify your email providers SMTP server and credentials, as well as the from option:

  • Host - smtp.example.com
  • Port - 465
  • Username - username@example.com
  • Password - password
  • From - username@example.com or Ackee <username@example.com>

Note: For port 465 ackee-report will use TLS when connecting to your email server, on all other ports it will use STARTTLS (#44)

Common providers:

Gmail

If you use gmail to send emails, use these values:

  • Host - smtp.gmail.com
  • Port - 465
  • Username - your gmail username (your email address)
  • Password - your gmail password or if you have 2FA enabled, an "Application Specific password"

SendGrid

If you use SendGrid to send emails, use these values:

  • Host - smtp.sendgrid.net
  • Port - 465
  • Username - apikey (everyone's username is apiKey)
  • Password - your API Key (generate one here)

MailGun

If you use SendGrid to send emails, use these values:

  • Host - smtp.mailgun.org
  • Port - 465
  • Username - postmaster@yourdomain.name
  • Password - get your password here

📚 Usage

Usage: ackee-report <command> [options]

CLI tool to generate performance reports of websites using the self-hosted analytics tool Ackee.

Commands:
  email [options]             Generate report and send it via email
  json [options]              Query API for data and output it to JSON file
  html [options]              Generate report and output it to a HTML file
  rss|xml [options]           Generate report as a RSS feed
  domains|domain [titles...]  get domain id by title
  config                      output current config
  help [command]              display help for command

Options:
  General:
    -d, --domain <titles...>    specify domains by title
    -i, --id <ids...>           specify domains by id
    -r, --range <range>         specify data range (default: "month")
    -l, --limit <number>        limit number of list items (default: 3)
    -e, --events [type]         get event data (default: false)
    -v, --version               output the version number
    -h, --help                  display help for command

  Email:
    -t, --to <recipient...>     to whom the report should be sent

  RSS/JSON/HTML:
    -o, --output <file>         path to output file (default: "report.[xml/json/html]")

Example call:
  $ ackee-report email --domain example.com --to hello@example.com

If you want to send the report periodically, you have to setup a cron job which runs the command at a set interval (example below).

🛠️ Examples

Generate a report for one domain and send it via email

ackee-report email -d example.com -t hello@example.com

This will generate a report for the domain example.com and send it via email to hello@example.com.

Multiple domains and recipients

ackee-report email -d example.com example2.com -t hello@example.com hey@example2.com

Include events in report

ackee-report email -d all -t hello@example.com -e

Average event type

ackee-report email -d all -t hello@example.com -e avg

Custom range

You can specify the data range of the report with the -r/--range option:

ackee-report email -d example.com -t hello@example.com -r day

Available values: day/week/month/six_months.

Note: The total views/range value is calculated by counting the views of the last x days since the program ran. For the month range for example, it is not the number of views in the current month, but the number of views in the last 30 days.

Send the report periodically (cron)

To send a report periodically, for example every month at the 28th at 23:55 setup a cron job like this:

55 23 28 * * ackee-report email -d example.com -t hello@example.com >> /tmp/ackee-report.log 2>&1

Note: We use the 28th as running on the last day of the month is very complicated with cron and Ackee resets its data on the first day of each month.

Note: You may have to specify the actual path to ackee-report. In that case, replace ackee-report in the command above with the output of which ackee-report.

If you are not familiar with cron, here's a tutorial on how to get started.

To send multiple reports to different people, add them all as seperate cron jobs.

Generate RSS feed

You can generate a RSS feed/xml file instead of sending the report via email:

ackee-report rss -d example.com -o output.xml

Output the report to a JSON file

You can also save the report in a JSON file:

ackee-report json -d example.com -o output.json

Output the email report to a HTML file

You can also save the report, which is normally sent via email, directly to an HTML file:

ackee-report html -d example.com -o index.html

🖼️ Screenshot

Full Report

💻 Development

Issues and PRs are very welcome!

Run yarn lint or npm run lint to use eslint.

Please check out the contributing guide before you start.

To see differences with previous versions refer to the CHANGELOG.

❔ About

This library is an extension to the awesome privacy focused analytics tool Ackee.

Ackee was developed by @electerious, if you want to support him and the development of Ackee visit the Donate section on the Ackee repository.

Download Details:

Author: BetaHuhn
Source Code: https://github.com/BetaHuhn/ackee-report 
License: MIT license

#javascript #report #cli #privacy #analytics 

Ackee-report: CLI tool To Generate Performance Reports Of Websites
Beth  Cooper

Beth Cooper

1659701760

Fast Read and Highly Scalable Optimized Social Activity Feed in Ruby

SimpleFeed

1. Scalable, Easy to Use Activity Feed Implementation.

Noteplease feel free to read this README in the formatted-for-print PDF Version.

1.1. Build & Gem Status

1.2. Test Coverage Map

Coverage Map

ImportantPlease read the (somewhat outdated) blog post Feeding Frenzy with SimpleFeed launching this library. Please leave comments or questions in the discussion thread at the bottom of that post. Thanks!

If you like to see this project grow, your donation of any amount is much appreciated.

Donate

This is a fast, pure-ruby implementation of an activity feed concept commonly used in social networking applications. The implementation is optimized for read-time performance and high concurrency (lots of users), and can be extended with custom backend providers. One data provider come bundled: the production-ready Redis provider.

Important Notes and Acknowledgements:

SimpleFeed does not depend on Ruby on Rails and is a pure-ruby implementation

SimpleFeed requires MRI Ruby 2.3 or later

SimpleFeed is currently live in production

SimpleFeed is open source thanks to the generosity of Simbi, Inc.

2. Features

SimpleFeed is a Ruby Library that can be plugged into any application to power a fast, Redis-based activity feed implementation so common on social networking sites. SimpleFeed offers the following features:

Modelled after graph-relationships similar to those on Twitter (bi-directional independent follow relationships):

Feed maintains a reverse-chronological order for heterogeneous events for each user.

It offers a constant time lookup for user’s feed, avoiding complex SQL joins to render it.

An API to read/paginate the feed for a given user

As well as to query the total unread items in the feed since it was last read by the user (typically shown on App icons).

Scalable and well performing Redis-based activity feed —

Scales to millions of users (will need to use Twemproxy to shard across several Redis instances)

Stores a fixed number of events for each unique "user" — the default is 1000. When the feed reaches 1001 events, the oldest event is offloaded from the activity.

Implementation properties:

Fully thread-safe implementation, writing events can be done in eg. Sidekiq.

Zero assumptions about what you are storing: the "data" is just a string. Serialize it with JSON, Marshall, YAML, or whatever.

You can create as many different types of feeds per application as you like (no Ruby Singletons used).

Customize mapping from user_id to the activity id based on your business logic (more on this later).

2.1. Publishing Events

Pushing events to the feed requires the following:

An Event consisting of:

String data that, most commonly, is a foreign key to a database table, but can really be anything you like.

Float at (typically, the timestamp, but can be any float number)

One or more user IDs, or event consumers: basically — who should see the event being published in their feed.

You publish an event by choosing a set of users whose feed should be updated. For example, were you re-implementing Twitter, your array of user_ids when publishing an event would be all followers of the Tweet’s author. While the data would probably be the Tweet ID.

NotePublishing an event to the feeds of N users is roughly a O(N * log(N)) operation

2.2. Consuming Events (Reading / Rendering the Feed)

You can fetch the chronologically ordered events for a particular user, using:

Methods on the activity such as paginate, fetch.

Reading feed for one user (or one type of user) is a O(1) operation

For each activity (user) you can fetch the total_count and the unread_count — the number of total and new items in the feed, where unread_count is computed since the user last reset their read status.

Note: total_count can never exceed the maximum size of the feed that you configured. The default is 1000 items.

The last_read timestamp can be automatically reset when the user is shown the feed via paginate method (whether or not its reset is controlled via a method argument).

2.3. Modifying User’s Feed

For any given user, you can:

Wipe their feed with wipe

Selectively remove items from the feed with delete_if.

For instance, if a user un-follows someone they shouldn’t see their events anymore, so you’d have to call delete_if and remove any events published by the unfollowed user.

2.4. Aggregating Events

This is a feature planned for future versions.

Help us much appreciated, even if you are not a developer, but have a clear idea about how it should work.

3. Commercial & Enterprise Support

Commercial Support plans are available for SimpleFeed through author’s ReinventONE Inc consulting company. Please reach out to kig AT reinvent.one for more information.

4. Usage

4.1. Example

Please read the additional documentation, including the examples, on the project’s Github Wiki.

Below is a screen shot of an actual activity feed powered by this library.

usage

4.2. Providers

A key concept to understanding SimpleFeed gem, is that of a provider, which is effectively a persistence implementation for the events belonging to each user.

One providers are supplied with this gem: the production-ready :redis provider, which uses the sorted set Redis data type to store and fetch the events, scored by time (but not necessarily).

You initialize a provider by using the SimpleFeed.provider([Symbol]) method.

4.3. Configuration

Below we configure a feed called :newsfeed, which in this example will be populated with the various events coming from the followers.

require 'simplefeed'

# Let's define a Redis-based feed, and wrap Redis in a in a ConnectionPool.

SimpleFeed.define(:newsfeed) do |f|
  f.provider   = SimpleFeed.provider(:redis,
                                      redis: -> { ::Redis.new },
                                      pool_size: 10)
  f.per_page   = 50     # default page size
  f.batch_size = 10     # default batch size
  f.namespace  = 'nf'   # only needed if you use the same redis for more than one feed
end

After the feed is defined, the gem creates a similarly named method under the SimpleFeed namespace to access the feed. For example, given a name such as :newsfeed the following are all valid ways of accessing the feed:

SimpleFeed.newsfeed

SimpleFeed.get(:newsfeed)

You can also get a full list of currently defined feeds with SimpleFeed.feed_names method.

4.4. Reading from and writing to the feed

For the impatient, here is a quick way to get started with the SimpleFeed.

# Let's use the feed we defined earlier and create activity for all followers of the current user
publish_activity = SimpleFeed.newsfeed.activity(@current_user.followers.map(&:id))

# Store directly the value and the optional time stamp
publish_activity.store(value: 'hello', at: Time.now)
# => true  # indicates that value 'hello' was not yet in the feed (all events must be unique)

# Or, using the event form:
publish_activity.store(event: SimpleFeed::Event.new('good bye', Time.now))
# => true

As we’ve added the two events for these users, we can now read them back, sorted by the time and paginated:

# Let's grab the first follower
user_activity = SimpleFeed.newsfeed.activity(@current_user.followers.first.id)

# Now we can paginate the events, while resetting this user's last-read timestamp:
user_activity.paginate(page: 1, reset_last_read: true)
# [
#     [0] #<SimpleFeed::Event: value=hello, at=1480475294.0579991>,
#     [1] #<SimpleFeed::Event: value=good bye, at=1480472342.8979871>,
# ]
ImportantNote that we stored the activity by passing an array of users, but read the activity for just one user. This is how you’d use SimpleFeed most of the time, with the exception of the alternative mapping described below.

4.5. User IDs

In the previous section you saw the examples of publishing events to many feeds, and then reading the activity for a given user.

SimpleFeed supports user IDs that are either numeric (integer) or string-based (eg, UUID). Numeric IDs are best for simplest cases, and are the most compact. String keys offer the most flexibility.

4.5.1. Activity Keys

In the next section we’ll talk about generating keys from user_ids. We mean — Redis Hash keys that uniquely map a user (or a set of users) to the activity feed they should see.

There are up to two keys that are computed depending on the situation:

data_key is used to store the actual feed events

meta_key is used to store user’s last_read status

4.5.2. Partitioning Schema

NoteThis feature is only available in SimpleFeed Version 3+.

You can take advantage of string user IDs for situations where your feed requires keys to be composite for instance. Just remember that SimpleFeed does not care about what’s in your user ID, and even what you call "a user". It’s convenient to think of the activities in terms of users, because typically each user has a unique feed that only they see.

But you can just as easily use zip code as the unique activity ID, and create one feed of events per geographical location, that all folks living in that zip code share. But what about other countries?

Now you use partitioning scheme: make the "user_id" argument a combination iso_country_code.postal_code, eg for San Francisco, you’d use us.94107, but for Australia you could use, eg au.3148.

4.5.3. Relationship between an Activity and a User

One to One

In the most common case, you will have one activity per user.

For instance, in the Twitter example, each Twitter user has a unique tweeter feed that only they see.

The events are published when someone posts a tweet, to the array of all users that follow the Tweet author.

One to Many

However, SimpleFeed supports one additional use-case, where you might have one activity shared among many users.

Imagine a service that notifies residents of important announcements based on user’s zip code of residence.

We want this feed to work as follows:

All users that share a zip-code should see the same exact feed.

However, all users should never share the individual’s last_read status: so if two people read the same activity from the same zip code, their unread_count should change independently.

In terms of the activity keys, this means:

data_key should be based on the zip-code of each user, and be one to many with users.

meta_key should be based on the user ID as we want it to be 1-1 with users.

To support this use-case, SimpleFeed supports two optional transformer lambdas that can be applied to each user object when computing their activity feed hash key:

SimpleFeed.define(:zipcode_alerts) do |f|
  f.provider   = SimpleFeed.provider(:redis, redis: -> { ::Redis.new }, pool_size: 10)
  f.namespace  = 'zc'
  f.data_key_transformer = ->(user) { user.zip_code }  # actual feed data is stored once per zip code
  f.meta_key_transformer = ->(user) { user.id }        # last_read status is stored once per user
end

When you publish events into this feed, you would need to provide User objects that all respond to .zip_code method (based on the above configuration). Since the data is only defined by Zip Code, you probably don’t want to be publishing it via a giant array of users. Most likely, you’ll want to publish event based on the zip code, and consume them based on the user ID.

To support this user-case, let’s modify our transformer lambda (only the data one) as follows — so that it can support both the consuming read by a user case, and the publishing a feed by zip code case:

Alternatively, you could do something like this:

  f.data_key_transformer = ->(entity) do
    case entity
      when User
        entity.zip_code.to_i
      when String # UUIDs
        User.find(entity)&.zip_code.to_i
      when ZipCode, Numeric
        entity.to_i
      else
        raise ArgumentError, "Invalid type #{entity.class.name}"
    end
  end

Just make sure that your users always have .zip_code defined, and that ZipCode.new(94107).to_i returns exactly the same thing as @user.zip_code.to_i or your users won’t see the feeds they are supposed to see.

4.6. The Two Forms of the Feed API

The feed API is offered in two forms:

single-user form, and

a batch (multi-user) form.

The method names and signatures are the same. The only difference is in what the methods return:

In the single user case, the return of, say, #total_count is an Integer value representing the total count for this user.

In the multi-user case, the return is a SimpleFeed::Response instance, that can be thought of as a Hash, that has the user IDs as the keys, and return results for each user as a value.

Please see further below the details about the Batch API.

Single-User API

In the examples below we show responses based on a single-user usage. As previously mentioned, the multi-user usage is the same, except what the response values are, and is discussed further down below.

Let’s take a look at a ruby session, which demonstrates return values of the feed operations for a single user:

require 'simplefeed'

# Define the feed using Redis provider, which uses
# SortedSet to keep user's events sorted.
SimpleFeed.define(:followers) do |f|
  f.provider = SimpleFeed.provider(:redis)
  f.per_page = 50
  f.per_page = 2
end

# Let's get the Activity instance that wraps this
activity = SimpleFeed.followers.activity(user_id)         # => [... complex object removed for brevity ]

# let's clear out this feed to ensure it's empty
activity.wipe                                             # => true

# Let's verify that the counts for this feed are at zero
activity.total_count                                      # => 0
activity.unread_count                                     # => 0

# Store some events
activity.store(value: 'hello')                            # => true
activity.store(value: 'goodbye', at: Time.now - 20)       # => true
activity.unread_count                                     # => 2

# Now we can paginate the events, while resetting this user's last-read timestamp:
activity.paginate(page: 1, reset_last_read: true)
# [
#     [0] #<SimpleFeed::Event: value=good bye, at=1480475294.0579991>,
#     [1] #<SimpleFeed::Event: value=hello, at=1480475294.057138>
# ]
# Now the unread_count should return 0 since the user just "viewed" the feed.
activity.unread_count                                     # => 0
activity.delete(value: 'hello')                           # => true
# the next method yields to a passed in block for each event in the user's feed, and deletes
# all events for which the block returns true. The return of this call is the
# array of all events that have been deleted for this user.
activity.delete_if do |event, user_id|
  event.value =~ /good/
end
# => [
#     [0] #<SimpleFeed::Event: value=good bye, at=1480475294.0579991>
# ]
activity.total_count                                      # => 0

You can fetch all items (optionally filtered by time) in the feed using #fetch, #paginate and reset the last_read timestamp by passing the reset_last_read: true as a parameter.

 

Batch (Multi-User) API

This API should be used when dealing with an array of users (or, in the future, a Proc or an ActiveRecord relation).

There are several reasons why this API should be preferred for operations that perform a similar action across a range of users: various provider implementations can be heavily optimized for concurrency, and performance.

The Redis Provider, for example, uses a notion of pipelining to send updates for different users asynchronously and concurrently.

Multi-user operations return a SimpleFeed::Response object, which can be used as a hash (keyed on user_id) to fetch the result of a given user.

# Using the Feed API with, eg #find_in_batches
@event_producer.followers.find_in_batches do |group|

  # Convert a group to the array of IDs and get ready to store
  activity = SimpleFeed.get(:followers).activity(group.map(&:id))
  activity.store(value: "#{@event_producer.name} liked an article")

  # => [Response] { user_id1 => [Boolean], user_id2 => [Boolean]... }
  # true if the value was stored, false if it wasn't.
end

Activity Feed DSL (Domain-Specific Language)

The library offers a convenient DSL for adding feed functionality into your current scope.

To use the module, just include SimpleFeed::DSL where needed, which exports just one primary method #with_activity. You call this method and pass an activity object created for a set of users (or a single user), like so:

require 'simplefeed/dsl'
include SimpleFeed::DSL

feed = SimpleFeed.newsfeed
activity = feed.activity(current_user.id)
data_to_store = %w(France Germany England)

def report(value)
  puts value
end

with_activity(activity, countries: data_to_store) do
  # we can use countries as a variable because it was passed above in **opts
  countries.each do |country|
    # we can call #store without a receiver because the block is passed to
    # instance_eval
    store(value: country) { |result| report(result ? 'success' : 'failure') }
    # we can call #report inside the proc because it is evaluated in the
    # outside context of the #with_activity

    # now let's print a color ASCII dump of the entire feed for this user:
    color_dump
  end
  printf "Activity counts are: %d unread of %d total\n", unread_count, total_count
end

The DSL context has access to two additional methods:

#event(value, at) returns a fully constructed SimpleFeed::Event instance

#color_dump prints to STDOUT the ASCII text dump of the current user’s activities (events), as well as the counts and the last_read shown visually on the time line.

#color_dump

Below is an example output of color_dump method, which is intended for the debugging purposes.

sf color dump

Figure 1. #color_dump method output

 

5. Complete API

For completeness sake we’ll show the multi-user API responses only. For a single-user use-case the response is typically a scalar, and the input is a singular user_id, not an array of ids.

Multi-User (Batch) API

Each API call at this level expects an array of user IDs, therefore the return value is an object, SimpleFeed::Response, containing individual responses for each user, accessible via response[user_id] method.

@multi = SimpleFeed.get(:feed_name).activity(User.active.map(&:id))

@multi.store(value:, at:)
@multi.store(event:)
# => [Response] { user_id => [Boolean], ... } true if the value was stored, false if it wasn't.

@multi.delete(value:, at:)
@multi.delete(event:)
# => [Response] { user_id => [Boolean], ... } true if the value was removed, false if it didn't exist

@multi.delete_if do |event, user_id|
  # if the block returns true, the event is deleted and returned
end
# => [Response] { user_id => [deleted_event1, deleted_event2, ...], ... }

# Wipe the feed for a given user(s)
@multi.wipe
# => [Response] { user_id => [Boolean], ... } true if user activity was found and deleted, false otherwise

# Return a paginated list of all items, optionally with the total count of items
@multi.paginate(page: 1,
                per_page: @multi.feed.per_page,
                with_total: false,
                reset_last_read: false)
# => [Response] { user_id => [Array]<Event>, ... }
# Options:
#   reset_last_read: false — reset last read to Time.now (true), or the provided timestamp
#   with_total: true — returns a hash for each user_id:
#        => [Response] { user_id => { events: Array<Event>, total_count: 3 }, ... }

# Return un-paginated list of all items, optionally filtered
@multi.fetch(since: nil, reset_last_read: false)
# => [Response] { user_id => [Array]<Event>, ... }
# Options:
#   reset_last_read: false — reset last read to Time.now (true), or the provided timestamp
#   since: <timestamp> — if provided, returns all items posted since then
#   since: :last_read — if provided, returns all unread items and resets +last_read+

@multi.reset_last_read
# => [Response] { user_id => [Time] last_read, ... }

@multi.total_count
# => [Response] { user_id => [Integer, String] total_count, ... }

@multi.unread_count
# => [Response] { user_id => [Integer, String] unread_count, ... }

@multi.last_read
# => [Response] { user_id => [Time] last_read, ... }

6. Providers

As we’ve discussed above, a provider is an underlying persistence mechanism implementation.

It is the intention of this gem that:

it should be easy to write new providers

it should be easy to swap out providers

One provider is included with this gem:

6.1. SimpleFeed::Providers::Redis::Provider

Redis Provider is a production-ready persistence adapter that uses the sorted set Redis data type.

This provider is optimized for large writes and can use either a single Redis instance for all users of your application, or any number of Redis shards by using a Twemproxy in front of the Redis shards.

If you set environment variable REDIS_DEBUG to true and run the example (see below) you will see every operation redis performs. This could be useful in debugging an issue or submitting a bug report.

7. Running the Examples and Specs

Source code for the gem contains the examples folder with an example file that can be used to test out the providers, and see what they do under the hood.

Both the specs and the example requires a local redis instance to be available.

To run it, checkout the source of the library, and then:

git clone https://github.com/kigster/simple-feed.git
cd simple-feed

# on OSX with HomeBrew:
brew install redis
brew services start redis

# check that your redis is up:
redis-cli info

# install bundler and other dependencies
gem install bundler --version 2.1.4
bundle install
bundle exec rspec  # make sure tests are passing

# run the example:
ruby examples/redis_provider_example.rb

The above command will help you download, setup all dependencies, and run the examples for a single user:

running example

Figure 2. Running Redis Example in a Terminal

If you set REDIS_DEBUG variable prior to running the example, you will be able to see every single Redis command executed as the example works its way through. Below is a sample output:

running example redis debug

Figure 3. Running Redis Example with REDIS_DEBUG set

7.1. Generating Ruby API Documentation

rake doc

This should use Yard to generate the documentation, and open your browser once it’s finished.

7.2. Installation

Add this line to your application’s Gemfile:

gem 'simple-feed'

And then execute:

$ bundle

Or install it yourself as:

$ gem install simple-feed

7.3. Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

7.4. Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/kigster/simple-feed

7.5. License

The gem is available as open source under the terms of the MIT License.

FOSSA Scan Status

7.6. Acknowledgements

This project is conceived and sponsored by Simbi, Inc..

Author’s personal experience at Wanelo, Inc. has served as an inspiration.


Author: kigster
Source code: https://github.com/kigster/simple-feed
License: MIT license

#ruby   #ruby-on-rails 

Fast Read and Highly Scalable Optimized Social Activity Feed in Ruby
Kennith  Kuhic

Kennith Kuhic

1658235600

Cloudforest ML: Ensembles Of Decision Trees in Go/Golang.

CloudForest 

Google Group

Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang).

CloudForest allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical / categorical data with missing values. These include:

  • Breiman and Cutler's Random Forest for Classification and Regression
  • Adaptive Boosting (AdaBoost) Classification
  • Gradient Boosting Tree Regression and Two Class Classification
  • Hellinger Distance Trees for Classification
  • Entropy, Cost driven and Class Weighted classification
  • L1/Absolute Deviance Decision Tree regression
  • Improved Feature Selection via artificial contrasts with ensembles (ACE)
  • Roughly Balanced Bagging for Unbalanced Data
  • Improved robustness using out of bag cases and artificial contrasts.
  • Support for missing values via bias correction or three way splitting.
  • Proximity/Affinity Analysis suitable for manifold learning
  • A number of experimental splitting criteria

The Design Prioritizes:

  • Training speed
  • Performance on highly dimensional heterogeneous datasets (e.g. genetic and clinical data).
  • An optimized set of core functionality.
  • The flexibility to quickly implement new impurities and algorithms using the common core.
  • The ability to natively handle non numerical data types and missing values.
  • Use in a multi core or multi machine environment.

It can achieve quicker training times then many other popular implementations on some datasets. This is the result of cpu cache friendly memory utilization well suited to modern processors and separate, optimized paths to learn splits from binary, numerical and categorical data.

Benchmarks

CloudForest offers good general accuracy and the alternative and augmented algorithms it implements can offer reduced error rate for specific use cases including especially recovering a signal from noisy, high dimensional data prone to over-fitting and predicting rare events and unbalanced classes (both of which are typical in genetic studies of diseases). These methods should be included in parameter sweeps to maximize accuracy.

Error

(Work on benchmarks and optimization is ongoing, if you find a slow use case please raise an issue.)

Command line utilities to grow, apply and analyze forests and do cross validation are provided or CloudForest can be used as a library in go programs.

This Document covers command line usage, file formats and some algorithmic background.

Documentation for coding against CloudForest has been generated with godoc and can be viewed live at: http://godoc.org/github.com/ryanbressler/CloudForest

Pull requests, spelling corrections and bug reports are welcome; Code Repo and Issue tracker can be found at: https://github.com/ryanbressler/CloudForest

A google discussion group can be found at: https://groups.google.com/forum/#!forum/cloudforest-dev

CloudForest was created in the Shumelivich Lab at the Institute for Systems Biology.

(Build status includes accuracy tests on iris and Boston housing price datasets and multiple go versions.)

Installation

With go installed:

go get github.com/ryanbressler/CloudForest
go install github.com/ryanbressler/CloudForest/growforest
go install github.com/ryanbressler/CloudForest/applyforest

#optional utilities
go install github.com/ryanbressler/CloudForest/leafcount
go install github.com/ryanbressler/CloudForest/utils/nfold
go install github.com/ryanbressler/CloudForest/utils/toafm

To update to the latest version use the -u flag

go get -u github.com/ryanbressler/CloudForest
go install -u github.com/ryanbressler/CloudForest/growforest
go install -u github.com/ryanbressler/CloudForest/applyforest

#optional utilities
go install -u github.com/ryanbressler/CloudForest/leafcount
go install -u github.com/ryanbressler/CloudForest/utils/nfold
go install -u github.com/ryanbressler/CloudForest/utils/toafm

Quick Start

Data can be provided in a tsv based anotated feature matrix or in arff or libsvm formats with ".arff" or ".libsvm" extensions. Details are discussed in the Data File Formats section below and a few example data sets are included in the "data" directory.

#grow a predictor forest with default parameters and save it to forest.sf
growforest -train train.fm -rfpred forest.sf -target B:FeatureName

#grow a 1000 tree forest using, 16 cores and report out of bag error 
#with minimum leafSize 8 
growforest -train train.fm -rfpred forest.sf -target B:FeatureName -oob \
-nCores 16 -nTrees 1000 -leafSize 8

#grow a 1000 tree forest evaluating half the features as candidates at each 
#split and reporting out of bag error after each tree to watch for convergence
growforest -train train.fm -rfpred forest.sf -target B:FeatureName -mTry .5 -progress 

#growforest with weighted random forest
growforest -train train.fm -rfpred forest.sf -target B:FeatureName \
-rfweights '{"true":2,"false":0.5}'

#report all growforest options
growforest -h

#Print the (balanced for classification, least squares for regression error 
#rate on test data to standard out
applyforest -fm test.fm -rfpred forest.sf

#Apply the forest, report errorrate and save predictions
#Predictions are output in a tsv as:
#CaseLabel    Predicted    Actual
applyforest -fm test.fm -rfpred forest.sf -preds predictions.tsv

#Calculate counts of case vs case (leaves) and case vs feature (branches) proximity.
#Leaves are reported as:
#Case1 Case2 Count
#Branches Are Reported as:
#Case Feature Count
leafcount -train train.fm -rfpred forest.sf -leaves leaves.tsv -branches branches.tsv

#Generate training and testing folds
nfold -fm data.fm

#growforest with internal training and testing
growforest -train train_0.fm -target N:FeatureName -test test_0.fm

#growforest with internal training and testing, 10 ace feature selection permutations and
#testing performed only using significant features
growforest -train train_0.fm -target N:FeatureName -test test_0.fm -ace 10 -cutoff .05

Growforest Utility

growforest trains a forest using the following parameters which can be listed with -h

Parameter's are implemented using go's parameter parser so that boolean parameters can be set to true with a simple flag:

#the following are equivalent
growforest -oob
growforest -oob=true

And equals signs and quotes are optional for other parameters:

#the following are equivalent
growforest -train featurematrix.afm
growforest -train="featurematrix.afm"

Basic options

  -target="": The row header of the target in the feature matrix.
  -train="featurematrix.afm": AFM formated feature matrix containing training data.
  -rfpred="rface.sf": File name to output predictor forest in sf format.
  -leafSize="0": The minimum number of cases on a leaf node. If <=0 will be inferred to 1 for classification 4 for regression.
  -maxDepth=0: Maximum tree depth. Ignored if 0.
  -mTry="0": Number of candidate features for each split as a count (ex: 10) or portion of total (ex: .5). Ceil(sqrt(nFeatures)) if <=0.
  -nSamples="0": The number of cases to sample (with replacement) for each tree as a count (ex: 10) or portion of total (ex: .5). If <=0 set to total number of cases.
  -nTrees=100: Number of trees to grow in the predictor.
 
  -importance="": File name to output importance.

  -oob=false: Calculate and report oob error.
 

Advanced Options

  -blacklist="": A list of feature id's to exclude from the set of predictors.
  -includeRE="": Filter features that DON'T match this RE.
  -blockRE="": A regular expression to identify features that should be filtered out.
  -force=false: Force at least one non constant feature to be tested for each split as in scikit-learn.
  -impute=false: Impute missing values to feature mean/mode before growth.
  -nCores=1: The number of cores to use.
  -progress=false: Report tree number and running oob error.
  -oobpreds="": Calculate and report oob predictions in the file specified.
  -cpuprofile="": write cpu profile to file
  -multiboost=false: Allow multi-threaded boosting which may have unexpected results. (highly experimental)
  -nobag=false: Don't bag samples for each tree.
  -evaloob=false: Evaluate potential splitting features on OOB cases after finding split value in bag.
  -selftest=false: Test the forest on the data and report accuracy.
  -splitmissing=false: Split missing values onto a third branch at each node (experimental).
  -test="": Data to test the model on after training.

Regression Options

  -gbt=0: Use gradient boosting with the specified learning rate.
  -l1=false: Use l1 norm regression (target must be numeric).
  -ordinal=false: Use ordinal regression (target must be numeric).

Classification Options

  -adaboost=false: Use Adaptive boosting for classification.
  -balanceby="": Roughly balanced bag the target within each class of this feature.
  -balance=false: Balance bagging of samples by target class for unbalanced classification.
  -cost="": For categorical targets, a json string to float map of the cost of falsely identifying each category.
  -entropy=false: Use entropy minimizing classification (target must be categorical).
  -hellinger=false: Build trees using hellinger distance.
  -positive="True": Positive class to output probabilities for.
  -rfweights="": For categorical targets, a json string to float map of the weights to use for each category in Weighted RF.
  -NP=false: Do approximate Neyman-Pearson classification.
  -NP_a=0.1: Constraint on percision in NP classification [0,1]
  -NP_k=100: Weight of constraint in NP classification [0,Inf+)
  -NP_pos="1": Class label to constrain percision in NP classification.

Note: rfweights and cost should use json to specify the weights and or costs per class using the strings used to represent the class in the boolean or categorical feature:

   growforest -rfweights '{"true":2,"false":0.5}'

Randomizing Data and Artificial Contrasts

Randomizing shuffling parts of the data or including shuffled "Artifichal Contrasts" can be useful to establish baselines for comparison.

The "vet" option extends the principle to tree growth. When evaluating potential splitters it subtracts the impurity decrease from the best split candidate splitters can make on a shuffled target from the impurity decrease of the actual best split. This is intended to penalizes certain types of features that contribute to over-fitting including unique identifiers and sparse features

  -ace=0: Number ace permutations to do. Output ace style importance and p values.
  -permute: Permute the target feature (to establish random predictive power).
  -contrastall=false: Include a shuffled artificial contrast copy of every feature.
  -nContrasts=0: The number of randomized artificial contrast features to include in the feature matrix.
  -shuffleRE="": A regular expression to identify features that should be shuffled.
  -vet=false: Penalize potential splitter impurity decrease by subtracting the best split of a permuted target.

Applyforrest Utility

applyforest applies a forest to the specified feature matrix and outputs predictions as a two column (caselabel predictedvalue) tsv.

Usage of applyforest:
  -expit=false: Expit (inverst logit) transform data (for gradient boosting classification).
  -fm="featurematrix.afm": AFM formated feature matrix containing data.
  -mean=false: Force numeric (mean) voting.
  -mode=false: Force categorical (mode) voting.
  -preds="": The name of a file to write the predictions into.
  -rfpred="rface.sf": A predictor forest.
  -sum=false: Force numeric sum voting (for gradient boosting etc).
  -votes="": The name of a file to write categorical vote totals to.

Leafcount Utility

leafcount outputs counts of case case co-occurrence on leaf nodes (leaves.tsv, Brieman's proximity) and counts of the number of times a feature is used to split a node containing each case (branches.tsv a measure of relative/local importance).

Usage of leafcount:
  -branches="branches.tsv": a case by feature sparse matrix of leaf co-occurrence in tsv format
  -fm="featurematrix.afm": AFM formated feature matrix to use.
  -leaves="leaves.tsv": a case by case sparse matrix of leaf co-occurrence in tsv format
  -rfpred="rface.sf": A predictor forest.

nfold utility

nfold is a utility for generating cross validation folds. It can read in and ouput any of the supported formats. You can specify a catagorical target feature to do stratified sampeling which will balance the classes between the folds.

If no target feature is specified, a numerical target feature is specified or the -unstratified option is provided unstratified sampeling will be used.

Usage of nfold:
  -fm="featurematrix.afm": AFM formated feature matrix containing data.
  -folds=5: Number of folds to generate.
  -target="": The row header of the target in the feature matrix.
  -test="test_%v.fm": Format string for testing fms.
  -train="train_%v.fm": Format string for training fms.
  -unstratified=false: Force unstratified sampeling of categorical target.
  -writeall=false: Output all three formats.
  -writearff=false: Output arff.
  -writelibsvm=false: Output libsvm.

Importance

Variable Importance in CloudForest is based on the as the mean decrease in impurity over all of the splits made using a feature. It is output in a tsv as:

0123456
FeatureDecrease Per UseUse CountDecrease Per TreeDecrease Per Tree UsedTree Used CountMean Minimal Depth

Decrease per tree (col 3 starting from 0) is the most common definition of importance in other implementations and is calculated over all trees, not just the ones the feature was used in.

Each of these scores has different properties:

  • Per-use and per-tree-used scores may be more resistant to feature redundancy,
  • Per-tree-used and per-tree scores may better pick out complex effects.
  • Mean Minimal Depth has been proposed (see "Random Survival Forests") as an alternative importance.

To provide a baseline for evaluating importance, artificial contrast features can be used by including shuffled copies of existing features (-nContrasts, -contrastAll).

A feature that performs well when randomized (or when the target has been randomized) may be causing over-fitting.

The option to permute the target (-permute) will establish a minimum random baseline. Using a regular expression (-shuffleRE) to shuffle part of the data can be useful in teasing out the contributions of different subsets of features.

Importance with P-Values Via Artificial Contrasts/ACE

P-values can be established for importance scores by comparing the importance score for each feature to that of shuffled copy of itself or artificial contrast over a number of runs. This algorithm is described in Tuv's "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination."

Feature selection based on these p-values can increase the model's resistance to issues including over-fitting from high cardinality features.

In CloudForest these p-values are produces with a Welch's t-test and the null hypthesis that the mean importance of a features contrasts is greater then that of the feature itself over all of the forests. To use this method specify the number of forests/repeats to perform using the "-ace" option and provide a file name for importance scores via the -importance option. Importance scores will be the mean decrease per tree over all of the forests.

growforest -train housing.arff -target class -ace 10 -importance bostanimp.tsv

The output tsv will be a tsv with the following columns:

0123
targetpredictorp-valuemean importance

This method is often combined with the -evaloob method described bellow.

growforest -train housing.arff -target class -ace 10 -importance bostanimp.tsv -evaloob

Improved Feature Selection

Genomic data is frequently has many noisy, high cardinality, uninformative features which can lead to in bag over fitting. To combat this, CloudForest implements some methods designed to help better filter out uninformative features.

The -evaloob method evaluates potential best splitting features on the oob data after learning the split value for each splitter as normal from the in bag/branch data as normal. Importance scores are also calcualted using OOB cases. This idea is discussed in Eugene Tuv, Alexander Borisov, George Runger and Kari Torkkola's paper "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination."

The -vet option penalizes the impurity decrease of potential best split by subtracting the best split they can make after the target values cases on which the split is being evaluated have been shuffled.

In testing so far evaloob provides better performance and is less computationally intensive. These options can be used together which may provide the best performance in very noisy data. When used together vetting is also done on the out of bag cases.

Data With Unbalanced Classes

Genomic studies also frequently have unbalanced target classes. Ie you might be interested in a rare disease but have samples drawn from the general population. CloudForest implements three methods for dealing with such studies, roughly balanced bagging (-balance), cost weighted classification (-costs) and weighted gini impurity driven classification (-rfweights). See the references bellow for a discussion of these options.

Missing Values

By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data:

            I(split) = p(l)I(l)+p(r)I(r)+p(m)I(m)

Missing values in the target variable are left out of impurity calculations.

This provided generally good results at a fraction of the computational costs of imputing data.

Optionally, -impute can be called before forest growth to impute missing values to the feature mean/mode which Brieman suggests as a fast method for imputing values.

This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes.

Experimental support (-splitmissing) is provided for 3 way splitting which splits missing cases onto a third branch. This has so far yielded mixed results in testing.

Data File Formats

Data files in cloud forest are assumed to be in our Anotated Feature Matrix tsv based format unless a .libsvm or .arff file extension is used.

Anotated Feature Matrix Tsv Files

CloudForest borrows the annotated feature matrix (.afm) and stochastic forest (.sf) file formats from Timo Erkkila's rf-ace which can be found at https://code.google.com/p/rf-ace/

An annotated feature matrix (.afm) file is a tab delineated file with column and row headers. By default columns represent cases and rows represent features/variables though the transpose (rows as cases/observations) is also detected and supported.

A row header / feature id includes a prefix to specify the feature type. These prefixes are also used to detect column vs row orientation.

"N:" Prefix for numerical feature id.
"C:" Prefix for categorical feature id.
"B:" Prefix for boolean feature id.

Categorical and boolean features use strings for their category labels. Missing values are represented by "?","nan","na", or "null" (case insensitive). A short example:

featureid    case1    case2    case3
N:NumF1    0.0    .1    na
C:CatF2 red    red    green

Some sample feature matrix data files are included in the "data" directory.

ARFF Data Files

CloudFores also supports limited import of weka's ARFF format. This format will be detected via the ".arff" file extension. Only numeric and nominal/catagorical attributes are supported, all other attribute types will be assumed to be catagorical and should usully be removed or blacklisted. There is no support for spaces in feature names, quoted strings or sparse data. Trailing space or comments after the data field may cause odd behavior.

The ARFF format also provides an easy way to annotate a cvs file with information about the supplied fields:

@relation data

@attribute NumF1 numeric
@attribute CatF2 {red,green}

@data
0.0,red
.1,red
?,green

LibSvm/Svm Light Data Files

There is also basic support for sparse numerical data in libsvm's file format. This format will be detected by the ".libsvm" file extension and has some limitations. A simple libsvm file might look like:

24.0 1:0.00632 2:18.00 3:2.310 4:0
21.6 1:0.02731 2:0.00 3:7.070 7:78.90
34.7 1:0.02729 2:0.00 5:0.4690

The target field will be given the designation "0" and be in the "0" position of the matrix and you will need to use "-target 0" as an option with growforest. No other feature can have this designation.

The catagorical or numerical nature of the target variable will be detected from the value of the first line. If it is an integer value like 0,1 or 1200 the target will be parsed as catagorical and classification peformed. If it is a floating point value including a decmil place like 1.0, 1.7 etc the target will be parsed as numerical and regession performed. There is currentelly no way to override this behavior.

Models - Stochastic Forest Files

A stochastic forest (.sf) file contains a forest of decision trees. The main advantage of this format as opposed to an established format like json is that an sf file can be written iteratively tree by tree and multiple .sf files can be combined with minimal logic required allowing for massively parallel growth of forests with low memory use.

An .sf file consists of lines each of which is a comma separated list of key value pairs. Lines can designate either a FOREST, TREE, or NODE. Each tree belongs to the preceding forest and each node to the preceding tree. Nodes must be written in order of increasing depth.

CloudForest generates fewer fields then rf-ace but requires the following. Other fields will be ignored

Forest requires forest type (only RF currently), target and ntrees:

FOREST=RF|GBT|..,TARGET="$feature_id",NTREES=int

Tree requires only an int and the value is ignored though the line is needed to designate a new tree:

TREE=int

Node requires a path encoded so that the root node is specified by "*" and each split left or right as "L" or "R". Leaf nodes should also define PRED such as "PRED=1.5" or "PRED=red". Splitter nodes should define SPLITTER with a feature id inside of double quotes, SPLITTERTYPE=[CATEGORICAL|NUMERICAL] and a LVALUE term which can be either a float inside of double quotes representing the highest value sent left or a ":" separated list of categorical values sent left.

NODE=$path,PRED=[float|string],SPLITTER="$feature_id",SPLITTERTYPE=[CATEGORICAL|NUMERICAL] LVALUES="[float|: separated list"

An example .sf file:

FOREST=RF,TARGET="N:CLIN:TermCategory:NB::::",NTREES=12800
TREE=0
NODE=*,PRED=3.48283,SPLITTER="B:SURV:Family_Thyroid:F::::maternal",SPLITTERTYPE=CATEGORICAL,LVALUES="false"
NODE=*L,PRED=3.75
NODE=*R,PRED=1

Cloud forest can parse and apply .sf files generated by at least some versions of rf-ace.

Compiling for Speed

When compiled with go1.1 CloudForest achieves running times similar to implementations in other languages. Using gccgo (4.8.0 at least) results in longer running times and is not recommended. This may change as gcc go adopts the go 1.1 way of implementing closures.

References

The idea for (and trademark of the term) Random Forests originated with Leo Brieman and Adele Cuttler. Their code and paper's can be found at:

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

All code in CloudForest is original but some ideas for methods and optimizations were inspired by Timo Erkilla's rf-ace and Andy Liaw and Matthew Wiener randomForest R package based on Brieman and Cuttler's code:

https://code.google.com/p/rf-ace/ http://cran.r-project.org/web/packages/randomForest/index.html

The idea for Artificial Contrasts is based on: Eugene Tuvand and Kari Torkkola's "Feature Filtering with Ensembles Using Artificial Contrasts" http://enpub.fulton.asu.edu/workshop/FSDM05-Proceedings.pdf#page=74 and Eugene Tuv, Alexander Borisov, George Runger and Kari Torkkola's "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination" http://www.researchgate.net/publication/220320233_Feature_Selection_with_Ensembles_Artificial_Variables_and_Redundancy_Elimination/file/d912f5058a153a8b35.pdf

The idea for growing trees to minimize categorical entropy comes from Ross Quinlan's ID3: http://en.wikipedia.org/wiki/ID3_algorithm

"The Elements of Statistical Learning" 2nd edition by Trevor Hastie, Robert Tibshirani and Jerome Friedman was also consulted during development.

Methods for classification from unbalanced data are covered in several papers: http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163175/ http://www.biomedcentral.com/1471-2105/11/523 http://bib.oxfordjournals.org/content/early/2012/03/08/bib.bbs006 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0067863

Denisty Estimating Trees/Forests are Discussed: http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p627.pdf http://research.microsoft.com/pubs/158806/CriminisiForests_FoundTrends_2011.pdf The later also introduces the idea of manifold forests which can be learned using down stream analysis of the outputs of leafcount to find the Fiedler vectors of the graph laplacian.


Author: ryanbressler
Source code: https://github.com/ryanbressler/CloudForest
License: View license

#go #golang #machine-learning 

Cloudforest ML: Ensembles Of Decision Trees in Go/Golang.

Cloudforest: Ensembles Of Decision Trees in Go/Golang

CloudForest

Google Group

Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang).

CloudForest allows for a number of related algorithms for classification, regression, feature selection and structure analysis on heterogeneous numerical / categorical data with missing values. These include:

  • Breiman and Cutler's Random Forest for Classification and Regression
  • Adaptive Boosting (AdaBoost) Classification
  • Gradient Boosting Tree Regression and Two Class Classification
  • Hellinger Distance Trees for Classification
  • Entropy, Cost driven and Class Weighted classification
  • L1/Absolute Deviance Decision Tree regression
  • Improved Feature Selection via artificial contrasts with ensembles (ACE)
  • Roughly Balanced Bagging for Unbalanced Data
  • Improved robustness using out of bag cases and artificial contrasts.
  • Support for missing values via bias correction or three way splitting.
  • Proximity/Affinity Analysis suitable for manifold learning
  • A number of experimental splitting criteria

The Design Prioritizes:

  • Training speed
  • Performance on highly dimensional heterogeneous datasets (e.g. genetic and clinical data).
  • An optimized set of core functionality.
  • The flexibility to quickly implement new impurities and algorithms using the common core.
  • The ability to natively handle non numerical data types and missing values.
  • Use in a multi core or multi machine environment.

It can achieve quicker training times then many other popular implementations on some datasets. This is the result of cpu cache friendly memory utilization well suited to modern processors and separate, optimized paths to learn splits from binary, numerical and categorical data.

Benchmarks

CloudForest offers good general accuracy and the alternative and augmented algorithms it implements can offer reduced error rate for specific use cases including especially recovering a signal from noisy, high dimensional data prone to over-fitting and predicting rare events and unbalanced classes (both of which are typical in genetic studies of diseases). These methods should be included in parameter sweeps to maximize accuracy.

Error

(Work on benchmarks and optimization is ongoing, if you find a slow use case please raise an issue.)

Command line utilities to grow, apply and analyze forests and do cross validation are provided or CloudForest can be used as a library in go programs.

This Document covers command line usage, file formats and some algorithmic background.

Documentation for coding against CloudForest has been generated with godoc and can be viewed live at: http://godoc.org/github.com/ryanbressler/CloudForest

Pull requests, spelling corrections and bug reports are welcome; Code Repo and Issue tracker can be found at: https://github.com/ryanbressler/CloudForest

A google discussion group can be found at: https://groups.google.com/forum/#!forum/cloudforest-dev

CloudForest was created in the Shumelivich Lab at the Institute for Systems Biology.

(Build status includes accuracy tests on iris and Boston housing price datasets and multiple go versions.)

Installation

With go installed:

go get github.com/ryanbressler/CloudForest
go install github.com/ryanbressler/CloudForest/growforest
go install github.com/ryanbressler/CloudForest/applyforest

#optional utilities
go install github.com/ryanbressler/CloudForest/leafcount
go install github.com/ryanbressler/CloudForest/utils/nfold
go install github.com/ryanbressler/CloudForest/utils/toafm

To update to the latest version use the -u flag

go get -u github.com/ryanbressler/CloudForest
go install -u github.com/ryanbressler/CloudForest/growforest
go install -u github.com/ryanbressler/CloudForest/applyforest

#optional utilities
go install -u github.com/ryanbressler/CloudForest/leafcount
go install -u github.com/ryanbressler/CloudForest/utils/nfold
go install -u github.com/ryanbressler/CloudForest/utils/toafm

Quick Start

Data can be provided in a tsv based anotated feature matrix or in arff or libsvm formats with ".arff" or ".libsvm" extensions. Details are discussed in the Data File Formats section below and a few example data sets are included in the "data" directory.

#grow a predictor forest with default parameters and save it to forest.sf
growforest -train train.fm -rfpred forest.sf -target B:FeatureName

#grow a 1000 tree forest using, 16 cores and report out of bag error 
#with minimum leafSize 8 
growforest -train train.fm -rfpred forest.sf -target B:FeatureName -oob \
-nCores 16 -nTrees 1000 -leafSize 8

#grow a 1000 tree forest evaluating half the features as candidates at each 
#split and reporting out of bag error after each tree to watch for convergence
growforest -train train.fm -rfpred forest.sf -target B:FeatureName -mTry .5 -progress 

#growforest with weighted random forest
growforest -train train.fm -rfpred forest.sf -target B:FeatureName \
-rfweights '{"true":2,"false":0.5}'

#report all growforest options
growforest -h

#Print the (balanced for classification, least squares for regression error 
#rate on test data to standard out
applyforest -fm test.fm -rfpred forest.sf

#Apply the forest, report errorrate and save predictions
#Predictions are output in a tsv as:
#CaseLabel    Predicted    Actual
applyforest -fm test.fm -rfpred forest.sf -preds predictions.tsv

#Calculate counts of case vs case (leaves) and case vs feature (branches) proximity.
#Leaves are reported as:
#Case1 Case2 Count
#Branches Are Reported as:
#Case Feature Count
leafcount -train train.fm -rfpred forest.sf -leaves leaves.tsv -branches branches.tsv

#Generate training and testing folds
nfold -fm data.fm

#growforest with internal training and testing
growforest -train train_0.fm -target N:FeatureName -test test_0.fm

#growforest with internal training and testing, 10 ace feature selection permutations and
#testing performed only using significant features
growforest -train train_0.fm -target N:FeatureName -test test_0.fm -ace 10 -cutoff .05

Growforest Utility

growforest trains a forest using the following parameters which can be listed with -h

Parameter's are implemented using go's parameter parser so that boolean parameters can be set to true with a simple flag:

#the following are equivalent
growforest -oob
growforest -oob=true

And equals signs and quotes are optional for other parameters:

#the following are equivalent
growforest -train featurematrix.afm
growforest -train="featurematrix.afm"

Basic options

  -target="": The row header of the target in the feature matrix.
  -train="featurematrix.afm": AFM formated feature matrix containing training data.
  -rfpred="rface.sf": File name to output predictor forest in sf format.
  -leafSize="0": The minimum number of cases on a leaf node. If <=0 will be inferred to 1 for classification 4 for regression.
  -maxDepth=0: Maximum tree depth. Ignored if 0.
  -mTry="0": Number of candidate features for each split as a count (ex: 10) or portion of total (ex: .5). Ceil(sqrt(nFeatures)) if <=0.
  -nSamples="0": The number of cases to sample (with replacement) for each tree as a count (ex: 10) or portion of total (ex: .5). If <=0 set to total number of cases.
  -nTrees=100: Number of trees to grow in the predictor.
 
  -importance="": File name to output importance.

  -oob=false: Calculate and report oob error.
 

Advanced Options

  -blacklist="": A list of feature id's to exclude from the set of predictors.
  -includeRE="": Filter features that DON'T match this RE.
  -blockRE="": A regular expression to identify features that should be filtered out.
  -force=false: Force at least one non constant feature to be tested for each split as in scikit-learn.
  -impute=false: Impute missing values to feature mean/mode before growth.
  -nCores=1: The number of cores to use.
  -progress=false: Report tree number and running oob error.
  -oobpreds="": Calculate and report oob predictions in the file specified.
  -cpuprofile="": write cpu profile to file
  -multiboost=false: Allow multi-threaded boosting which may have unexpected results. (highly experimental)
  -nobag=false: Don't bag samples for each tree.
  -evaloob=false: Evaluate potential splitting features on OOB cases after finding split value in bag.
  -selftest=false: Test the forest on the data and report accuracy.
  -splitmissing=false: Split missing values onto a third branch at each node (experimental).
  -test="": Data to test the model on after training.

Regression Options

  -gbt=0: Use gradient boosting with the specified learning rate.
  -l1=false: Use l1 norm regression (target must be numeric).
  -ordinal=false: Use ordinal regression (target must be numeric).

Classification Options

  -adaboost=false: Use Adaptive boosting for classification.
  -balanceby="": Roughly balanced bag the target within each class of this feature.
  -balance=false: Balance bagging of samples by target class for unbalanced classification.
  -cost="": For categorical targets, a json string to float map of the cost of falsely identifying each category.
  -entropy=false: Use entropy minimizing classification (target must be categorical).
  -hellinger=false: Build trees using hellinger distance.
  -positive="True": Positive class to output probabilities for.
  -rfweights="": For categorical targets, a json string to float map of the weights to use for each category in Weighted RF.
  -NP=false: Do approximate Neyman-Pearson classification.
  -NP_a=0.1: Constraint on percision in NP classification [0,1]
  -NP_k=100: Weight of constraint in NP classification [0,Inf+)
  -NP_pos="1": Class label to constrain percision in NP classification.

Note: rfweights and cost should use json to specify the weights and or costs per class using the strings used to represent the class in the boolean or categorical feature:

   growforest -rfweights '{"true":2,"false":0.5}'

Randomizing Data and Artificial Contrasts

Randomizing shuffling parts of the data or including shuffled "Artifichal Contrasts" can be useful to establish baselines for comparison.

The "vet" option extends the principle to tree growth. When evaluating potential splitters it subtracts the impurity decrease from the best split candidate splitters can make on a shuffled target from the impurity decrease of the actual best split. This is intended to penalizes certain types of features that contribute to over-fitting including unique identifiers and sparse features

  -ace=0: Number ace permutations to do. Output ace style importance and p values.
  -permute: Permute the target feature (to establish random predictive power).
  -contrastall=false: Include a shuffled artificial contrast copy of every feature.
  -nContrasts=0: The number of randomized artificial contrast features to include in the feature matrix.
  -shuffleRE="": A regular expression to identify features that should be shuffled.
  -vet=false: Penalize potential splitter impurity decrease by subtracting the best split of a permuted target.

Applyforrest Utility

applyforest applies a forest to the specified feature matrix and outputs predictions as a two column (caselabel predictedvalue) tsv.

Usage of applyforest:
  -expit=false: Expit (inverst logit) transform data (for gradient boosting classification).
  -fm="featurematrix.afm": AFM formated feature matrix containing data.
  -mean=false: Force numeric (mean) voting.
  -mode=false: Force categorical (mode) voting.
  -preds="": The name of a file to write the predictions into.
  -rfpred="rface.sf": A predictor forest.
  -sum=false: Force numeric sum voting (for gradient boosting etc).
  -votes="": The name of a file to write categorical vote totals to.

Leafcount Utility

leafcount outputs counts of case case co-occurrence on leaf nodes (leaves.tsv, Brieman's proximity) and counts of the number of times a feature is used to split a node containing each case (branches.tsv a measure of relative/local importance).

Usage of leafcount:
  -branches="branches.tsv": a case by feature sparse matrix of leaf co-occurrence in tsv format
  -fm="featurematrix.afm": AFM formated feature matrix to use.
  -leaves="leaves.tsv": a case by case sparse matrix of leaf co-occurrence in tsv format
  -rfpred="rface.sf": A predictor forest.

nfold utility

nfold is a utility for generating cross validation folds. It can read in and ouput any of the supported formats. You can specify a catagorical target feature to do stratified sampeling which will balance the classes between the folds.

If no target feature is specified, a numerical target feature is specified or the -unstratified option is provided unstratified sampeling will be used.

Usage of nfold:
  -fm="featurematrix.afm": AFM formated feature matrix containing data.
  -folds=5: Number of folds to generate.
  -target="": The row header of the target in the feature matrix.
  -test="test_%v.fm": Format string for testing fms.
  -train="train_%v.fm": Format string for training fms.
  -unstratified=false: Force unstratified sampeling of categorical target.
  -writeall=false: Output all three formats.
  -writearff=false: Output arff.
  -writelibsvm=false: Output libsvm.

Importance

Variable Importance in CloudForest is based on the as the mean decrease in impurity over all of the splits made using a feature. It is output in a tsv as:

0123456
FeatureDecrease Per UseUse CountDecrease Per TreeDecrease Per Tree UsedTree Used CountMean Minimal Depth

Decrease per tree (col 3 starting from 0) is the most common definition of importance in other implementations and is calculated over all trees, not just the ones the feature was used in.

Each of these scores has different properties:

  • Per-use and per-tree-used scores may be more resistant to feature redundancy,
  • Per-tree-used and per-tree scores may better pick out complex effects.
  • Mean Minimal Depth has been proposed (see "Random Survival Forests") as an alternative importance.

To provide a baseline for evaluating importance, artificial contrast features can be used by including shuffled copies of existing features (-nContrasts, -contrastAll).

A feature that performs well when randomized (or when the target has been randomized) may be causing over-fitting.

The option to permute the target (-permute) will establish a minimum random baseline. Using a regular expression (-shuffleRE) to shuffle part of the data can be useful in teasing out the contributions of different subsets of features.

Importance with P-Values Via Artificial Contrasts/ACE

P-values can be established for importance scores by comparing the importance score for each feature to that of shuffled copy of itself or artificial contrast over a number of runs. This algorithm is described in Tuv's "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination."

Feature selection based on these p-values can increase the model's resistance to issues including over-fitting from high cardinality features.

In CloudForest these p-values are produces with a Welch's t-test and the null hypthesis that the mean importance of a features contrasts is greater then that of the feature itself over all of the forests. To use this method specify the number of forests/repeats to perform using the "-ace" option and provide a file name for importance scores via the -importance option. Importance scores will be the mean decrease per tree over all of the forests.

growforest -train housing.arff -target class -ace 10 -importance bostanimp.tsv

The output tsv will be a tsv with the following columns:

0123
targetpredictorp-valuemean importance

This method is often combined with the -evaloob method described bellow.

growforest -train housing.arff -target class -ace 10 -importance bostanimp.tsv -evaloob

Improved Feature Selection

Genomic data is frequently has many noisy, high cardinality, uninformative features which can lead to in bag over fitting. To combat this, CloudForest implements some methods designed to help better filter out uninformative features.

The -evaloob method evaluates potential best splitting features on the oob data after learning the split value for each splitter as normal from the in bag/branch data as normal. Importance scores are also calcualted using OOB cases. This idea is discussed in Eugene Tuv, Alexander Borisov, George Runger and Kari Torkkola's paper "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination."

The -vet option penalizes the impurity decrease of potential best split by subtracting the best split they can make after the target values cases on which the split is being evaluated have been shuffled.

In testing so far evaloob provides better performance and is less computationally intensive. These options can be used together which may provide the best performance in very noisy data. When used together vetting is also done on the out of bag cases.

Data With Unbalanced Classes

Genomic studies also frequently have unbalanced target classes. Ie you might be interested in a rare disease but have samples drawn from the general population. CloudForest implements three methods for dealing with such studies, roughly balanced bagging (-balance), cost weighted classification (-costs) and weighted gini impurity driven classification (-rfweights). See the references bellow for a discussion of these options.

Missing Values

By default cloud forest uses a fast heuristic for missing values. When proposing a split on a feature with missing data the missing cases are removed and the impurity value is corrected to use three way impurity which reduces the bias towards features with lots of missing data:

            I(split) = p(l)I(l)+p(r)I(r)+p(m)I(m)

Missing values in the target variable are left out of impurity calculations.

This provided generally good results at a fraction of the computational costs of imputing data.

Optionally, -impute can be called before forest growth to impute missing values to the feature mean/mode which Brieman suggests as a fast method for imputing values.

This forest could also be analyzed for proximity (using leafcount or tree.GetLeaves) to do the more accurate proximity weighted imputation Brieman describes.

Experimental support (-splitmissing) is provided for 3 way splitting which splits missing cases onto a third branch. This has so far yielded mixed results in testing.

Data File Formats

Data files in cloud forest are assumed to be in our Anotated Feature Matrix tsv based format unless a .libsvm or .arff file extension is used.

Anotated Feature Matrix Tsv Files

CloudForest borrows the annotated feature matrix (.afm) and stochastic forest (.sf) file formats from Timo Erkkila's rf-ace which can be found at https://code.google.com/p/rf-ace/

An annotated feature matrix (.afm) file is a tab delineated file with column and row headers. By default columns represent cases and rows represent features/variables though the transpose (rows as cases/observations) is also detected and supported.

A row header / feature id includes a prefix to specify the feature type. These prefixes are also used to detect column vs row orientation.

"N:" Prefix for numerical feature id.
"C:" Prefix for categorical feature id.
"B:" Prefix for boolean feature id.

Categorical and boolean features use strings for their category labels. Missing values are represented by "?","nan","na", or "null" (case insensitive). A short example:

featureid    case1    case2    case3
N:NumF1    0.0    .1    na
C:CatF2 red    red    green

Some sample feature matrix data files are included in the "data" directory.

ARFF Data Files

CloudFores also supports limited import of weka's ARFF format. This format will be detected via the ".arff" file extension. Only numeric and nominal/catagorical attributes are supported, all other attribute types will be assumed to be catagorical and should usully be removed or blacklisted. There is no support for spaces in feature names, quoted strings or sparse data. Trailing space or comments after the data field may cause odd behavior.

The ARFF format also provides an easy way to annotate a cvs file with information about the supplied fields:

@relation data

@attribute NumF1 numeric
@attribute CatF2 {red,green}

@data
0.0,red
.1,red
?,green

LibSvm/Svm Light Data Files

There is also basic support for sparse numerical data in libsvm's file format. This format will be detected by the ".libsvm" file extension and has some limitations. A simple libsvm file might look like:

24.0 1:0.00632 2:18.00 3:2.310 4:0
21.6 1:0.02731 2:0.00 3:7.070 7:78.90
34.7 1:0.02729 2:0.00 5:0.4690

The target field will be given the designation "0" and be in the "0" position of the matrix and you will need to use "-target 0" as an option with growforest. No other feature can have this designation.

The catagorical or numerical nature of the target variable will be detected from the value of the first line. If it is an integer value like 0,1 or 1200 the target will be parsed as catagorical and classification peformed. If it is a floating point value including a decmil place like 1.0, 1.7 etc the target will be parsed as numerical and regession performed. There is currentelly no way to override this behavior.

Models - Stochastic Forest Files

A stochastic forest (.sf) file contains a forest of decision trees. The main advantage of this format as opposed to an established format like json is that an sf file can be written iteratively tree by tree and multiple .sf files can be combined with minimal logic required allowing for massively parallel growth of forests with low memory use.

An .sf file consists of lines each of which is a comma separated list of key value pairs. Lines can designate either a FOREST, TREE, or NODE. Each tree belongs to the preceding forest and each node to the preceding tree. Nodes must be written in order of increasing depth.

CloudForest generates fewer fields then rf-ace but requires the following. Other fields will be ignored

Forest requires forest type (only RF currently), target and ntrees:

FOREST=RF|GBT|..,TARGET="$feature_id",NTREES=int

Tree requires only an int and the value is ignored though the line is needed to designate a new tree:

TREE=int

Node requires a path encoded so that the root node is specified by "*" and each split left or right as "L" or "R". Leaf nodes should also define PRED such as "PRED=1.5" or "PRED=red". Splitter nodes should define SPLITTER with a feature id inside of double quotes, SPLITTERTYPE=[CATEGORICAL|NUMERICAL] and a LVALUE term which can be either a float inside of double quotes representing the highest value sent left or a ":" separated list of categorical values sent left.

NODE=$path,PRED=[float|string],SPLITTER="$feature_id",SPLITTERTYPE=[CATEGORICAL|NUMERICAL] LVALUES="[float|: separated list"

An example .sf file:

FOREST=RF,TARGET="N:CLIN:TermCategory:NB::::",NTREES=12800
TREE=0
NODE=*,PRED=3.48283,SPLITTER="B:SURV:Family_Thyroid:F::::maternal",SPLITTERTYPE=CATEGORICAL,LVALUES="false"
NODE=*L,PRED=3.75
NODE=*R,PRED=1

Cloud forest can parse and apply .sf files generated by at least some versions of rf-ace.

Compiling for Speed

When compiled with go1.1 CloudForest achieves running times similar to implementations in other languages. Using gccgo (4.8.0 at least) results in longer running times and is not recommended. This may change as gcc go adopts the go 1.1 way of implementing closures.

References

The idea for (and trademark of the term) Random Forests originated with Leo Brieman and Adele Cuttler. Their code and paper's can be found at:

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

All code in CloudForest is original but some ideas for methods and optimizations were inspired by Timo Erkilla's rf-ace and Andy Liaw and Matthew Wiener randomForest R package based on Brieman and Cuttler's code:

https://code.google.com/p/rf-ace/ http://cran.r-project.org/web/packages/randomForest/index.html

The idea for Artificial Contrasts is based on: Eugene Tuvand and Kari Torkkola's "Feature Filtering with Ensembles Using Artificial Contrasts" http://enpub.fulton.asu.edu/workshop/FSDM05-Proceedings.pdf#page=74 and Eugene Tuv, Alexander Borisov, George Runger and Kari Torkkola's "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination" http://www.researchgate.net/publication/220320233_Feature_Selection_with_Ensembles_Artificial_Variables_and_Redundancy_Elimination/file/d912f5058a153a8b35.pdf

The idea for growing trees to minimize categorical entropy comes from Ross Quinlan's ID3: http://en.wikipedia.org/wiki/ID3_algorithm

"The Elements of Statistical Learning" 2nd edition by Trevor Hastie, Robert Tibshirani and Jerome Friedman was also consulted during development.

Methods for classification from unbalanced data are covered in several papers: http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163175/ http://www.biomedcentral.com/1471-2105/11/523 http://bib.oxfordjournals.org/content/early/2012/03/08/bib.bbs006 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0067863

Denisty Estimating Trees/Forests are Discussed: http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p627.pdf http://research.microsoft.com/pubs/158806/CriminisiForests_FoundTrends_2011.pdf The later also introduces the idea of manifold forests which can be learned using down stream analysis of the outputs of leafcount to find the Fiedler vectors of the graph laplacian.

Author: Ryanbressler
Source Code: https://github.com/ryanbressler/CloudForest 
License: View license

#go #golang #machinelearning 

Cloudforest: Ensembles Of Decision Trees in Go/Golang
August  Larson

August Larson

1625021700

Create an Advanced Gantt Chart in Python

How to generate a Gantt Chart which contains additional details like Departments, Percentage Completion etc.

Introduction

This is the second part of visualizing Gantt Chart using python. Check this article in which steps for creating a basic Gantt Chart are explained in detail.

The basic Gantt Chart contains bars corresponding to every task of the project. In this article we will improve upon the basic Gantt Chart including details like completion status and sub-category for each task.

#data-visualization #gantt-chart #report #python #create #advanced gantt chart in python

Create an Advanced Gantt Chart in Python
Kabanda  Nat

Kabanda Nat

1624172593

A web testing deep dive: The MDN web testing report

In the Web DNA studies for 2019 and 2020, developers ranked the need “Having to support specific browsers, (e.g., IE11)” as the most frustrating aspect of web development, among 28 needs. The 2nd and 3rd rankings were also related to browser compatibility:

  1. Avoiding or removing a feature that doesn’t work across browsers
  2. Making a design look/work the same across browsers

In 2020, we released our browser compatibility research results — a deeper dive into identifying specific issues around browser compatibility and pinpointing what can be done to mitigate these issues. This year we decided to follow up with another deep dive focused on the 4th most frustrating aspect of developing for the web, “Testing across browsers.” It follows on nicely from the previous deep dive, and also concerns much-sought-after information.

#web testing #web developers #report #web dev

A web testing deep dive: The MDN web testing report
Orlo  Gottlieb

Orlo Gottlieb

1618856940

Artificial intelligence and Bigdata In The Program Of The US FTC

The relationship between bigdata and Artificial Intelligence has been at the center of the speech of the Acting Chairman of the Federal Trade Commission, Rebecca Kelly Slaughter, at the Forum for the Future of Privacy, reported in the document entitled “Protecting consumer privacy in times of crisis”, on February 10, 2021.
The starting point of her talk was this undoubtedly relevant observation: “…companies are collecting and using consumer data in illegal ways: we should require violators to reject not only the illegally obtained data, but also the benefits — here, the algorithms — generated by that data.” Slaughter then proposes, downstream from recent cases involving photo apps or women’s tech apps, that effective consumer notification of wrongdoing should be used, and have a greater impact, than FTC enforcement orders, which remain all but unknown.

#ftc #big-data #artificial-intelligence #ai #report

Artificial intelligence and Bigdata In The Program Of The US FTC

GitHub Availability Report: November 2020

Introduction

In November, we experienced two incidents resulting in significant impact and degraded state of availability for issues, pull requests, and GitHub Actions services.

November 2 12:00 UTC (lasting 32 minutes)

The SSL certificate for *.githubassets.com expired, impacting web requests for GitHub.com UI and services. There was an auto-generated issue indicating the certificate was within 30 days of expiration, but it was not addressed in time. Impact was reported, and the on-call engineer remediated it promptly.

#actions #github actions #report #github

GitHub Availability Report: November 2020
Brennan  Veum

Brennan Veum

1615281000

JavaScript Trends in 2021. analysing State Of JavaScript Report Results

The annual report “State of JavaScript 2020” is out with some thought-provoking and eyebrow-raising results. We couldn’t omit so important study on this programming language, hence why I’ve called my workmates to pull out their crystal balls and together we tried to make an educated prediction about the upcoming JavaScript ecosystem trends in 2021.

JavaScript’s flavour is overpowered by Typescript popularity

What can we say, TypeScript is definitely way past being a top JavaScript trend and we can talk about a standard now. And an obvious choice for the vast majority of software development projects. Typescript popularity has been increasing for the last four years and it’s 100% the most promising language for 2021 with a 78% usage ratio amongst professional developers.

#frontend #javascript #report

JavaScript Trends in 2021. analysing State Of JavaScript Report Results

Sean Dillard

1613076971

How To Report On Software Testing

Being able to write concise, easily comprehensible software testing reports is an important skill for software development team members to possess, particularly those in quality assurance, development, and support. Poorly written software testing reports can make the development process more difficult and less productive.

Imagine a client asks if their app is ready for launch and based on your assessment, everything is working correctly. A couple months later, the client complains of critical bugs, which weren’t apparent until the app was in production. Now the client is demanding your development team to fix the issues, while you’ve already started devoting resources to a new project.

This type of situation could be avoided with a thorough testing management phase, which includes comprehensive testing reports. Let’s examine the how’s and why’s of software testing reports.

Agile testing or waterfall?

One of the reasons why development teams receive complaints after a release is a lack of software testing. A proper testing report is a record which contains all test activities and test results. It helps in determining the current status of a project and analyzing what corrective actions need to be made.

Agile testing is a testing practice that adheres to the standards and rules of agile software development. Agile testing strategy isn’t linear, but continuous and adapts to software updates and changes in scope.

For example, cyber security software is always evolving so an agile testing method would be better recommended for software reporting in cyber security applications. Learn more cybersecurity courses offered by Udemy.

Testing is not a separate activity, it’s part of the developmental effort.

As the Agile manifesto puts it, “individuals and interactions” are more valuable than “comprehensive documentation.” This isn’t to imply that reports don’t have their place, however it’s essential to pick cautiously when and what to archive. It’s a key parity to strike and one you need to check to ensure you’re addressing what’s necessary.

In contrast, the waterfall testing strategy is able to adhere to a more strictly linear development process, and is useful for apps that are built with “Point A to Point B” in mind.

What Makes A Good Test Report?

  • Detail: You need to give a detailed depiction of the testing action, indicating which testing you have performed. Make an effort to not put the theoretical and technical information into the report.
  • Clear: All information in the test report should be short and clear to interpret.
  • Standard: The Standard Template needs to be followed. It is simple for stakeholders to survey and guarantee the consistency between test reports in numerous projects.
  • Specific: Incorporate specific points to depict and sum up the test result and spotlight on the primary concern.
  • Content: To make the report detailed and specific we need to look into its content.

It should include

  • Project information: All the relevant information should be mentioned, such as name, version, and a brief description,
  • Test Objectives: Mention objectives to help the reader to give context
  • Test Summary: It includes a general summary of the testing activity, i.e, how many tests are executed, passed and failed.
  • Defect: This section includes information about the total number of bugs and their statuses (open, closed, responding), number of bugs open, resolved, closed, and breakdown by priority and complexity.
  • All these points collectively make a report that properly communicates relevant information between all parties.

Who Are Test Reports Written For?

While making a report, you should think about who it is for and who will need to understand it.

Stakeholders include:

Software Testers: the test report can provide explicit data about conditions, item forms, or source information to utilize. The analyzers can help you to improve the arrangement, including missing data, and test approaches you hadn’t considered.
Project managers: they need to understand what you’re testing and how.
Customer support: They can provide information concerning the client behaviour, how customers utilize the framework, and the sort of issues they experience. This data can educate what testing exercises you may plan to cover their interests.
Product owners: they can disclose how the software is intended to be utilized. This data could help make client profiles which can help in testing

How To Write A Good Test Plan

Test Plans can take any number of forms. Some examples are:

  • Word documents are frequently the default structure. Test plans can go from a single page record that sums up the testing to extensive, IEEE829/IEEE29119
  • Mind maps are a fantastic method to pass on testing data in an organized graphical arrangement. Clients can follow the structure and drill down to the level of detail they require.
  • SharePoint/Wikis are both alternatives to Word documents with change management and editing functionality. They offer flexibility in how data is organized with speedy updates and multi-client editing.
  • Web-based planning devices, like Jira, can be utilized in combination with test management tools, like TestRail, to provide an image of all arranged and real testing.
  • Whiteboards/Kanban sheets are another method of demonstrating graphically the extent of testing. Physical sheets are another method of conveying what testing needs to be done.

Going past functional testing

To achieve proactive continuous application improvement, it is important to go past functional testing. Application performance management tools, such as Stackify Retrace, allow for a continuous feedback loop at each step of the SDLC. Using a code profiler, like Stackify Prefix, in your development environment and Retrace APM in your nonprod, qa, and prod environment helps catch bugs at the source.

Source: https://stackify.com/how-to-report-on-software-testing/

#software #testing #report

Hands-On Guide to Datapane Python Tool for Reporting and Visualization

Creating reports using python is an easy task because we can use different python libraries and combine our exploration of the data with some meaningful insights. But sharing this report is not that easy because not everyone or your client is used to python so that he can open your jupyter notebook and understand what you are trying to tell.

Datapane is an open-source python library/framework which makes it easy to turn scripts and notebooks into interactive reports. We can share these reports with our viewers or clients so that they can easily understand what the data is trying to tell.

Datapane allows you to systematically create reports from the objects in your Python notebook, such as pandas DataFrames, plots from visualization libraries, and Markdown text. We can also choose to publish our datapane reports online by selecting the desired audience.

In this article, we will explore how we can create a data report using Datapane and publish it to an HTML file.


#developers corner #data analysis #python #python library #report #visualization

Hands-On Guide to Datapane Python Tool for Reporting and Visualization