10 Popular Golang Libraries for Markup Languages

In today's post we will learn about 10 Popular Golang Libraries for Markup Languages.

What is markup language?

Markup language, standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. The most widely used markup languages are SGML (Standard Generalized Markup Language), HTML (Hypertext Markup Language), and XML (Extensible Markup Language). The markup symbols can be interpreted by a device (computer, printer, browser, etc.) to control how a document should look when printed or displayed on a monitor. A marked-up document thus contains two types of text: text to be displayed and markup language on how to display it.

Table of contents:

  • Bafi - Universal JSON, BSON, YAML, XML translator to ANY format using templates.
  • BBConvert - Converts bbCode to HTML that allows you to add support for custom bbCode tags.
  • Blackfriday - Markdown processor in Go.
  • Mxj - Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.
  • Go-output-format - Output go structures into multiple formats (YAML/JSON/etc) in your command line app.
  • Go-toml - Go library for the TOML format with query support and handy cli tools.
  • Goldmark - A Markdown parser written in Go. Easy to extend, standard (CommonMark) compliant, well structured.
  • Goq - Declarative unmarshaling of HTML using struct tags with jQuery syntax (uses GoQuery).
  • Html-to-markdown - Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
  • Htmlquery - An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

1 - Bafi:

Universal JSON, BSON, YAML, XML translator to ANY format using templates.

Key features

  • Various input formats (json, bson, yaml, csv, xml)
  • Flexible output formatting using text templates
  • Support for Lua custom functions which allows very flexible data manipulation
  • stdin/stdout support which allows get data from source -> translate -> delivery to destination. This allows easily translate data between different web services like REST to SOAP, SOAP to REST, REST to CSV, ...
  • Merge multiple input files in various formats into single output file formated using template

Releases (Windows, MAC, Linux) https://github.com/mmalcek/bafi/releases

usage:

bafi.exe -i testdata.xml -t template.tmpl -o output.txt

or

curl.exe -s https://api.predic8.de/shop/customers/ | bafi.exe -f json -t "?{{toXML .}}"

More examples and description in documentation

View on Github

2 - BBConvert:

Converts bbCode to HTML that allows you to add support for custom bbCode tags.

bbConvert is an easy way to process and convert bbCode to whatever you'd like. HTMLConverter is a converter from bbCode to HTML with some defaults (probably more than you'll need) ready if you use ImplementDefaults().

HTML Defaults

[b]Some Text[/b] //bolded text
[bold]Some Text[/bold] //bolded text
[i]Some Text[/i] //italicized text
[italics]Some Text[/italics] //italicized text
[u]Some Text[/u] //underlined text
[underline]Some Text[/underline] //underlined text
[s]some Text[/s] //strikedthrough text
[strike]Some Text[/strike] //strikethrough text
[font=Verdana]Some Text[/font] //text in verdana font
[font size=20pt]Some Text[/font] //20pt size text
[font color=red]Some Text[/font] //red text
[font color=#000000]Some Text[/font] //text with the color of #000000. The # is unnecessary
[font variant=upper]Some Text[/font] //uppercased text
[font variant=lower]Some Text[/font] //lowercase text
[font variant=smallcaps]Some Text[/font] //smallcaps text
[size=20pt]Some Text[/size] //20pt size text
[color=red]Some Text[/color] //red text
[color=#000000]Some Text[/color] //text with the color of #000000. The # is unnecessary
[smallcaps]Some Text[/smallcaps] //smallcaps text
[url]Link address[/url] //linked text
[url=address]Some Text[/url] //linked text
[url title="Title"]Link address[/url] //linked text with title
[link]Link address[/link] //linked text
[link=address]Some Text[/link] //linked text
[link title="Title"]Link address[/link] //linked text with title
[youtube]Youtube URL or video ID[/youtube] //youtube video
[youtube height=200 width=500]Youtube URL or video ID[/youtube] //youtube video with set size
[youtube=500x200]Youtube URL or video ID[/youtube] //youtube video with set size
[youtube left]Youtube URL or video ID[/youtube] //youtube video floated left
[youtube right]Youtube URL or video ID[/youtube] //youtube video floated right
[img]Image URL[/img] //an image
[img=500x200]Image URL[/img] //an image with set size
[img height=200 width=500]Image URL[/img] //an image with set size
[img left]Image URL[/img] //an image floated left
[img right]Image URL[/img] //an image floated right
[img alt="Alternate text"]Image URL[/img] //an image with alternate text
[img title="Title"]Image URL[/img] //an image with title
[image]Image URL[/image] //same as [img] tag
[title]Some Text[/title] //Large text made for use as a title
[t1]Some Text[/t1] //Large text made for use as a title. Same as [title]
[t2]Some Text[/t2] //Slightly smaller text than [t1]. Meant for use as a title of some sort
[t3]Some Text[/t3] //Slightly smaller text than [t2]. Meant for use as a title of some sort
[t4]Some Text[/t4] //Slightly smaller text than [t3]. Meant for use as a title of some sort
[t5]Some Text[/t5] //Slightly smaller text than [t4]. Meant for use as a title of some sort
[t6]Some Text[/t6] //Slightly smaller text than [t5]. Meant for use as a title of some sort
[align=center]Some Text[/align] //Aligns the insides (encapsulates the insides in a div)
[bullet]Bullet 1 * Bullet 2[/bullet] //bulleted list
[ul]
* Item 1
Item 2
[/ul] //an unordered (bulleted) list
[ol]
* Item 1
Item 2
[/ol] //an ordered (numbered) list
[bullet] * Item 1 * Item 2[/bullet] //same as [ul]
[number] * Item 1 * Item 2[/number] //same as [ol]
[ul]* Item 1 * Item 2[/ul] //an unordered (bulleted) list
[ol]* Item 1 * Item 2[/ol] //an ordered (numbered) list

View on Github

3 - Blackfriday:

Markdown processor in Go.

Blackfriday is a Markdown processor implemented in Go. It is paranoid about its input (so you can safely feed it user-supplied data), it is fast, it supports common extensions (tables, smart punctuation substitutions, etc.), and it is safe for all utf-8 (unicode) input.

HTML output is currently supported, along with Smartypants extensions.

It started as a translation from C of Sundown.

Installation

Blackfriday is compatible with modern Go releases in module mode. With Go installed:

go get github.com/russross/blackfriday

will resolve and add the package to the current development module, then build and install it. Alternatively, you can achieve the same if you import it in a package:

import "github.com/russross/blackfriday"

and go get without parameters.

Old versions of Go and legacy GOPATH mode might work, but no effort is made to keep them working.

Usage

v1

For basic usage, it is as simple as getting your input into a byte slice and calling:

output := blackfriday.MarkdownBasic(input)

This renders it with no extensions enabled. To get a more useful feature set, use this instead:

output := blackfriday.MarkdownCommon(input)

v2

For the most sensible markdown processing, it is as simple as getting your input into a byte slice and calling:

output := blackfriday.Run(input)

Your input will be parsed and the output rendered with a set of most popular extensions enabled. If you want the most basic feature set, corresponding with the bare Markdown specification, use:

output := blackfriday.Run(input, blackfriday.WithNoExtensions())

Sanitize untrusted content

Blackfriday itself does nothing to protect against malicious content. If you are dealing with user-supplied markdown, we recommend running Blackfriday's output through HTML sanitizer such as Bluemonday.

Here's an example of simple usage of Blackfriday together with Bluemonday:

import (
    "github.com/microcosm-cc/bluemonday"
    "github.com/russross/blackfriday"
)

// ...
unsafe := blackfriday.Run(input)
html := bluemonday.UGCPolicy().SanitizeBytes(unsafe)

View on Github

4 - Mxj:

Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-path, including wildcards.

mxj supplants the legacy x2j and j2x packages. If you want the old syntax, use mxj/x2j and mxj/j2x packages.

Installation

Using go.mod:

go get github.com/clbanning/mxj/v2@v2.3.2
import "github.com/clbanning/mxj/v2"

... or just vendor the package.

Related Packages

https://github.com/clbanning/checkxml provides functions for validating XML data.

Refactor Encoder - 2020.05.01

Issue #70 highlighted that encoding large maps does not scale well, since the original logic used string appends operations. Using bytes.Buffer results in linear scaling for very large XML docs. (Metrics based on MacBook Pro i7 w/ 16 GB.)

Nodes      m.XML() time
54809       12.53708ms
109780      32.403183ms
164678      59.826412ms
482598     109.358007ms

Refactor Decoder - 2015.11.15

For over a year I've wanted to refactor the XML-to-map[string]interface{} decoder to make it more performant. I recently took the time to do that, since we were using github.com/clbanning/mxj in a production system that could be deployed on a Raspberry Pi. Now the decoder is comparable to the stdlib JSON-to-map[string]interface{} decoder in terms of its additional processing overhead relative to decoding to a structure value. As shown by:

BenchmarkNewMapXml-4         	  100000	     18043 ns/op
BenchmarkNewStructXml-4      	  100000	     14892 ns/op
BenchmarkNewMapJson-4        	  300000	      4633 ns/op
BenchmarkNewStructJson-4     	  300000	      3427 ns/op
BenchmarkNewMapXmlBooks-4    	   20000	     82850 ns/op
BenchmarkNewStructXmlBooks-4 	   20000	     67822 ns/op
BenchmarkNewMapJsonBooks-4   	  100000	     17222 ns/op
BenchmarkNewStructJsonBooks-4	  100000	     15309 ns/op

Notices

2021.02.02: v2.5 - add XmlCheckIsValid toggle to force checking that the encoded XML is valid
2020.12.14: v2.4 - add XMLEscapeCharsDecoder to preserve XML escaped characters in Map values
2020.10.28: v2.3 - add TrimWhiteSpace option
2020.05.01: v2.2 - optimize map to XML encoding for large XML docs.
2019.07.04: v2.0 - remove unnecessary methods - mv.XmlWriterRaw, mv.XmlIndentWriterRaw - for Map and MapSeq.
2019.07.04: Add MapSeq type and move associated functions and methods from Map to MapSeq.
2019.01.21: DecodeSimpleValuesAsMap - decode to map[<tag>:map["#text":<value>]] rather than map[<tag>:<value>]
2018.04.18: mv.Xml/mv.XmlIndent encodes non-map[string]interface{} map values - map[string]string, map[int]uint, etc.
2018.03.29: mv.Gob/NewMapGob support gob encoding/decoding of Maps.
2018.03.26: Added mxj/x2j-wrapper sub-package for migrating from legacy x2j package.
2017.02.22: LeafNode paths can use ".N" syntax rather than "[N]" for list member indexing.
2017.02.10: SetFieldSeparator changes field separator for args in UpdateValuesForPath, ValuesFor... methods.
2017.02.06: Support XMPP stream processing - HandleXMPPStreamTag().
2016.11.07: Preserve name space prefix syntax in XmlSeq parser - NewMapXmlSeq(), etc.
2016.06.25: Support overriding default XML attribute prefix, "-", in Map keys - SetAttrPrefix().
2016.05.26: Support customization of xml.Decoder by exposing CustomDecoder variable.
2016.03.19: Escape invalid chars when encoding XML attribute and element values - XMLEscapeChars().
2016.03.02: By default decoding XML with float64 and bool value casting will not cast "NaN", "Inf", and "-Inf".
            To cast them to float64, first set flag with CastNanInf(true).
2016.02.22: New mv.Root(), mv.Elements(), mv.Attributes methods let you examine XML document structure.
2016.02.16: Add CoerceKeysToLower() option to handle tags with mixed capitalization.
2016.02.12: Seek for first xml.StartElement token; only return error if io.EOF is reached first (handles BOM).
2015.12.02: XML decoding/encoding that preserves original structure of document. See NewMapXmlSeq()
            and mv.XmlSeq() / mv.XmlSeqIndent().
2015-05-20: New: mv.StringIndentNoTypeInfo().
            Also, alphabetically sort map[string]interface{} values by key to prettify output for mv.Xml(),
            mv.XmlIndent(), mv.StringIndent(), mv.StringIndentNoTypeInfo().
2014-11-09: IncludeTagSeqNum() adds "_seq" key with XML doc positional information.
            (NOTE: PreserveXmlList() is similar and will be here soon.)
2014-09-18: inspired by NYTimes fork, added PrependAttrWithHyphen() to allow stripping hyphen from attribute tag.
2014-08-02: AnyXml() and AnyXmlIndent() will try to marshal arbitrary values to XML.
2014-04-28: ValuesForPath() and NewMap() now accept path with indexed array references.

View on Github

5 - Go-output-format:

Output go structures into multiple formats (YAML/JSON/etc) in your command line app.

NOTE: V2 is a breaking compatibility change from V1. Going forward, only V2 will be developed and supported.

Helper utility to output data structures in to standardized formats, much like what is built in to vault, az and kubectl

I really like how these apps provide for flexible output, but wanted a way to do it without needing to re-write or copy it for every new tool.

Need to parse some output with jq? JSON is your format. Want to put it out in an easy to read yet still standardized format? YAML is for you!

This tool is intended to provide all that in a single reusable package.

Usage

Basic

Import with:

import "github.com/drewstinnett/go-output-format/v2/gout"

Example Usage:

import (
 "os"
 "github.com/drewstinnett/go-output-format/v2/gout"
 "github.com/drewstinnett/go-output-format/v2/formats/json"
)

func main() {
 w, err := gout.New()
 if err != nil {
  panic(err)
 }
 // By Default, the YAML format is use, Let's change it to json though
 w.SetFormatter(json.Formatter{})

 // By Default, print to stdout. Let's change it to stderr though
 w.SetWriter(os.Stderr)

 // Print it on out!
 w.Print(struct {
  FirstName string
  LastName  string
 }{
  FirstName: "Bob",
  LastName:  "Ross",
 })
 // {"FirstName":"Bob","LastName":"Ross"}
}

Cobra Integration

To simplify using this in new projects, you can use the NewWithCobraCmd method. Example:

// By default, look for a field called 'format'
w, err := NewWithCobraCmd(cmd, nil)
// Or pass a configuration object with what the field is called
w, err := NewWithCobraCmd(cmd, &gout.CobraCmdConfig{
        FormatField: "my-special-name-field",
})

By default, the gout will use os.Stdout as the default writer.

See _examples for more example usage

View on Github

6 - Go-toml:

Go library for the TOML format with query support and handy cli tools.

Import

import "github.com/pelletier/go-toml/v2"

See Modules.

Features

Stdlib behavior

As much as possible, this library is designed to behave similarly as the standard library's encoding/json.

Performance

While go-toml favors usability, it is written with performance in mind. Most operations should not be shockingly slow. See benchmarks.

Strict mode

Decoder can be set to "strict mode", which makes it error when some parts of the TOML document was not present in the target structure. This is a great way to check for typos. See example in the documentation.

Contextualized errors

When most decoding errors occur, go-toml returns DecodeError), which contains a human readable contextualized version of the error. For example:

2| key1 = "value1"
3| key2 = "missing2"
 | ~~~~ missing field
4| key3 = "missing3"
5| key4 = "value4"

Local date and time support

TOML supports native local date/times. It allows to represent a given date, time, or date-time without relation to a timezone or offset. To support this use-case, go-toml provides LocalDate, LocalTime, and LocalDateTime. Those types can be transformed to and from time.Time, making them convenient yet unambiguous structures for their respective TOML representation.

Getting started

Given the following struct, let's see how to read it and write it as TOML:

type MyConfig struct {
      Version int
      Name    string
      Tags    []string
}

Unmarshaling

Unmarshal reads a TOML document and fills a Go structure with its content. For example:

doc := `
version = 2
name = "go-toml"
tags = ["go", "toml"]
`

var cfg MyConfig
err := toml.Unmarshal([]byte(doc), &cfg)
if err != nil {
      panic(err)
}
fmt.Println("version:", cfg.Version)
fmt.Println("name:", cfg.Name)
fmt.Println("tags:", cfg.Tags)

// Output:
// version: 2
// name: go-toml
// tags: [go toml]

View on Github

7 - Goldmark:

A Markdown parser written in Go. Easy to extend, standard (CommonMark) compliant, well structured.

Features

  • Standards-compliant. goldmark is fully compliant with the latest CommonMark specification.
  • Extensible. Do you want to add a @username mention syntax to Markdown? You can easily do so in goldmark. You can add your AST nodes, parsers for block-level elements, parsers for inline-level elements, transformers for paragraphs, transformers for the whole AST structure, and renderers.
  • Performance. goldmark's performance is on par with that of cmark, the CommonMark reference implementation written in C.
  • Robust. goldmark is tested with go test --fuzz.
  • Built-in extensions. goldmark ships with common extensions like tables, strikethrough, task lists, and definition lists.
  • Depends only on standard libraries.

Installation

$ go get github.com/yuin/goldmark

Usage

Import packages:

import (
    "bytes"
    "github.com/yuin/goldmark"
)

Convert Markdown documents with the CommonMark-compliant mode:

var buf bytes.Buffer
if err := goldmark.Convert(source, &buf); err != nil {
  panic(err)
}

With options

var buf bytes.Buffer
if err := goldmark.Convert(source, &buf, parser.WithContext(ctx)); err != nil {
  panic(err)
}
Functional optionTypeDescription
parser.WithContextA parser.ContextContext for the parsing phase.

View on Github

8 - Goq:

Declarative unmarshaling of HTML using struct tags with jQuery syntax (uses GoQuery).

Example

import (
	"log"
	"net/http"

	"astuart.co/goq"
)

// Structured representation for github file name table
type example struct {
	Title string `goquery:"h1"`
	Files []string `goquery:"table.files tbody tr.js-navigation-item td.content,text"`
}

func main() {
	res, err := http.Get("https://github.com/andrewstuart/goq")
	if err != nil {
		log.Fatal(err)
	}
	defer res.Body.Close()

	var ex example
	
	err = goq.NewDecoder(res.Body).Decode(&ex)
	if err != nil {
		log.Fatal(err)
	}

	log.Println(ex.Title, ex.Files)
}

goq

-- import "astuart.co/goq"

Package goq was built to allow users to declaratively unmarshal HTML into go structs using struct tags composed of css selectors.

I've made a best effort to behave very similarly to JSON and XML decoding as well as exposing as much information as possible in the event of an error to help you debug your Unmarshaling issues.

When creating struct types to be unmarshaled into, the following general rules apply:

Any type that implements the Unmarshaler interface will be passed a slice of *html.Node so that manual unmarshaling may be done. This takes the highest precedence.

Any struct fields may be annotated with goquery metadata, which takes the form of an element selector followed by arbitrary comma-separated "value selectors."

A value selector may be one of html, text, or [someAttrName]. html and text will result in the methods of the same name being called on the *goquery.Selection to obtain the value. [someAttrName] will result in *goquery.Selection.Attr("someAttrName") being called for the value.

A primitive value type will default to the text value of the resulting nodes if no value selector is given.

At least one value selector is required for maps, to determine the map key. The key type must follow both the rules applicable to go map indexing, as well as these unmarshaling rules. The value of each key will be unmarshaled in the same way the element value is unmarshaled.

For maps, keys will be retreived from the same level of the DOM. The key selector may be arbitrarily nested, though. The first level of children with any number of matching elements will be used, though.

For maps, any values must be nested below the level of the key selector. Parents or siblings of the element matched by the key selector will not be considered.

Once used, a "value selector" will be shifted off of the comma-separated list. This allows you to nest arbitrary levels of value selectors. For example, the type []map[string][]string would require one selector for the map key, and take an optional second selector for the values of the string slice.

Any struct type encountered in nested types (e.g. map[string]SomeStruct) will override any remaining "value selectors" that had not been used. For example, given:

struct S { F string goquery:",[bang]" }

struct { T map[string]S goquery:"#someId,[foo],[bar],[baz]" }

[foo] will be used to determine the string map key,but [bar] and [baz] will be ignored, with the [bang] tag present S struct type taking precedence.

View on Github

9 - Html-to-markdown:

Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.

Installation

go get github.com/JohannesKaufmann/html-to-markdown

Usage

import (
	"fmt"
	"log"

	md "github.com/JohannesKaufmann/html-to-markdown"
)

converter := md.NewConverter("", true, nil)

html := `<strong>Important</strong>`

markdown, err := converter.ConvertString(html)
if err != nil {
  log.Fatal(err)
}
fmt.Println("md ->", markdown)

If you are already using goquery you can pass a selection to Convert.

markdown, err := converter.Convert(selec)

Using it on the command line

If you want to make use of html-to-markdown on the command line without any Go coding, check out html2md, a cli wrapper for html-to-markdown that has all the following options and plugins builtin.

Options

The third parameter to md.NewConverter is *md.Options.

For example you can change the character that is around a bold text ("**") to a different one (for example "__") by changing the value of StrongDelimiter.

opt := &md.Options{
  StrongDelimiter: "__", // default: **
  // ...
}
converter := md.NewConverter("", true, opt)

For all the possible options look at godocs and for a example look at the example.

Adding Rules

converter.AddRules(
  md.Rule{
    Filter: []string{"del", "s", "strike"},
    Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
      // You need to return a pointer to a string (md.String is just a helper function).
      // If you return nil the next function for that html element
      // will be picked. For example you could only convert an element
      // if it has a certain class name and fallback if not.
      content = strings.TrimSpace(content)
      return md.String("~" + content + "~")
    },
  },
  // more rules
)

For more information have a look at the example add_rules.

View on Github

10 - Htmlquery:

An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

Overview

htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

htmlquery built-in the query object caching feature based on LRU, this feature will caching the recently used XPATH query string. Enable query caching can avoid re-compile XPath expression each query.

You can visit this page to learn about the supported XPath(1.0/2.0) syntax. https://github.com/antchfx/xpath

Installation

go get github.com/antchfx/htmlquery

Getting Started

Query, returns matched elements or error.

nodes, err := htmlquery.QueryAll(doc, "//a")
if err != nil {
	panic(`not a valid XPath expression.`)
}

Load HTML document from URL.

doc, err := htmlquery.LoadURL("http://example.com/")

Load HTML from document.

filePath := "/home/user/sample.html"
doc, err := htmlquery.LoadDoc(filePath)

Load HTML document from string.

s := `<html>....</html>`
doc, err := htmlquery.Parse(strings.NewReader(s))

Find all A elements.

list := htmlquery.Find(doc, "//a")

Find all A elements that have href attribute.

list := htmlquery.Find(doc, "//a[@href]")	

Find all A elements with href attribute and only return href value.

list := htmlquery.Find(doc, "//a/@href")	
for _ , n := range list{
	fmt.Println(htmlquery.SelectAttr(n, "href")) // output @href value
}

Find the third A element.

a := htmlquery.FindOne(doc, "//a[3]")

Find children element (img) under A href and print the source

a := htmlquery.FindOne(doc, "//a")
img := htmlquery.FindOne(a, "//img")
fmt.Prinln(htmlquery.SelectAttr(img, "src")) // output @src value

Evaluate the number of all IMG element.

expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)

View on Github

Thank you for following this article.

Related videos:

Golang Web Frameworks You MUST Learn (2022)

#go #golang #html #language 

What is GEEK

Buddha Community

10 Popular Golang Libraries for Markup Languages

10 Popular Golang Libraries for Markup Languages

In today's post we will learn about 10 Popular Golang Libraries for Markup Languages.

What is markup language?

Markup language, standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. The most widely used markup languages are SGML (Standard Generalized Markup Language), HTML (Hypertext Markup Language), and XML (Extensible Markup Language). The markup symbols can be interpreted by a device (computer, printer, browser, etc.) to control how a document should look when printed or displayed on a monitor. A marked-up document thus contains two types of text: text to be displayed and markup language on how to display it.

Table of contents:

  • Bafi - Universal JSON, BSON, YAML, XML translator to ANY format using templates.
  • BBConvert - Converts bbCode to HTML that allows you to add support for custom bbCode tags.
  • Blackfriday - Markdown processor in Go.
  • Mxj - Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.
  • Go-output-format - Output go structures into multiple formats (YAML/JSON/etc) in your command line app.
  • Go-toml - Go library for the TOML format with query support and handy cli tools.
  • Goldmark - A Markdown parser written in Go. Easy to extend, standard (CommonMark) compliant, well structured.
  • Goq - Declarative unmarshaling of HTML using struct tags with jQuery syntax (uses GoQuery).
  • Html-to-markdown - Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
  • Htmlquery - An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

1 - Bafi:

Universal JSON, BSON, YAML, XML translator to ANY format using templates.

Key features

  • Various input formats (json, bson, yaml, csv, xml)
  • Flexible output formatting using text templates
  • Support for Lua custom functions which allows very flexible data manipulation
  • stdin/stdout support which allows get data from source -> translate -> delivery to destination. This allows easily translate data between different web services like REST to SOAP, SOAP to REST, REST to CSV, ...
  • Merge multiple input files in various formats into single output file formated using template

Releases (Windows, MAC, Linux) https://github.com/mmalcek/bafi/releases

usage:

bafi.exe -i testdata.xml -t template.tmpl -o output.txt

or

curl.exe -s https://api.predic8.de/shop/customers/ | bafi.exe -f json -t "?{{toXML .}}"

More examples and description in documentation

View on Github

2 - BBConvert:

Converts bbCode to HTML that allows you to add support for custom bbCode tags.

bbConvert is an easy way to process and convert bbCode to whatever you'd like. HTMLConverter is a converter from bbCode to HTML with some defaults (probably more than you'll need) ready if you use ImplementDefaults().

HTML Defaults

[b]Some Text[/b] //bolded text
[bold]Some Text[/bold] //bolded text
[i]Some Text[/i] //italicized text
[italics]Some Text[/italics] //italicized text
[u]Some Text[/u] //underlined text
[underline]Some Text[/underline] //underlined text
[s]some Text[/s] //strikedthrough text
[strike]Some Text[/strike] //strikethrough text
[font=Verdana]Some Text[/font] //text in verdana font
[font size=20pt]Some Text[/font] //20pt size text
[font color=red]Some Text[/font] //red text
[font color=#000000]Some Text[/font] //text with the color of #000000. The # is unnecessary
[font variant=upper]Some Text[/font] //uppercased text
[font variant=lower]Some Text[/font] //lowercase text
[font variant=smallcaps]Some Text[/font] //smallcaps text
[size=20pt]Some Text[/size] //20pt size text
[color=red]Some Text[/color] //red text
[color=#000000]Some Text[/color] //text with the color of #000000. The # is unnecessary
[smallcaps]Some Text[/smallcaps] //smallcaps text
[url]Link address[/url] //linked text
[url=address]Some Text[/url] //linked text
[url title="Title"]Link address[/url] //linked text with title
[link]Link address[/link] //linked text
[link=address]Some Text[/link] //linked text
[link title="Title"]Link address[/link] //linked text with title
[youtube]Youtube URL or video ID[/youtube] //youtube video
[youtube height=200 width=500]Youtube URL or video ID[/youtube] //youtube video with set size
[youtube=500x200]Youtube URL or video ID[/youtube] //youtube video with set size
[youtube left]Youtube URL or video ID[/youtube] //youtube video floated left
[youtube right]Youtube URL or video ID[/youtube] //youtube video floated right
[img]Image URL[/img] //an image
[img=500x200]Image URL[/img] //an image with set size
[img height=200 width=500]Image URL[/img] //an image with set size
[img left]Image URL[/img] //an image floated left
[img right]Image URL[/img] //an image floated right
[img alt="Alternate text"]Image URL[/img] //an image with alternate text
[img title="Title"]Image URL[/img] //an image with title
[image]Image URL[/image] //same as [img] tag
[title]Some Text[/title] //Large text made for use as a title
[t1]Some Text[/t1] //Large text made for use as a title. Same as [title]
[t2]Some Text[/t2] //Slightly smaller text than [t1]. Meant for use as a title of some sort
[t3]Some Text[/t3] //Slightly smaller text than [t2]. Meant for use as a title of some sort
[t4]Some Text[/t4] //Slightly smaller text than [t3]. Meant for use as a title of some sort
[t5]Some Text[/t5] //Slightly smaller text than [t4]. Meant for use as a title of some sort
[t6]Some Text[/t6] //Slightly smaller text than [t5]. Meant for use as a title of some sort
[align=center]Some Text[/align] //Aligns the insides (encapsulates the insides in a div)
[bullet]Bullet 1 * Bullet 2[/bullet] //bulleted list
[ul]
* Item 1
Item 2
[/ul] //an unordered (bulleted) list
[ol]
* Item 1
Item 2
[/ol] //an ordered (numbered) list
[bullet] * Item 1 * Item 2[/bullet] //same as [ul]
[number] * Item 1 * Item 2[/number] //same as [ol]
[ul]* Item 1 * Item 2[/ul] //an unordered (bulleted) list
[ol]* Item 1 * Item 2[/ol] //an ordered (numbered) list

View on Github

3 - Blackfriday:

Markdown processor in Go.

Blackfriday is a Markdown processor implemented in Go. It is paranoid about its input (so you can safely feed it user-supplied data), it is fast, it supports common extensions (tables, smart punctuation substitutions, etc.), and it is safe for all utf-8 (unicode) input.

HTML output is currently supported, along with Smartypants extensions.

It started as a translation from C of Sundown.

Installation

Blackfriday is compatible with modern Go releases in module mode. With Go installed:

go get github.com/russross/blackfriday

will resolve and add the package to the current development module, then build and install it. Alternatively, you can achieve the same if you import it in a package:

import "github.com/russross/blackfriday"

and go get without parameters.

Old versions of Go and legacy GOPATH mode might work, but no effort is made to keep them working.

Usage

v1

For basic usage, it is as simple as getting your input into a byte slice and calling:

output := blackfriday.MarkdownBasic(input)

This renders it with no extensions enabled. To get a more useful feature set, use this instead:

output := blackfriday.MarkdownCommon(input)

v2

For the most sensible markdown processing, it is as simple as getting your input into a byte slice and calling:

output := blackfriday.Run(input)

Your input will be parsed and the output rendered with a set of most popular extensions enabled. If you want the most basic feature set, corresponding with the bare Markdown specification, use:

output := blackfriday.Run(input, blackfriday.WithNoExtensions())

Sanitize untrusted content

Blackfriday itself does nothing to protect against malicious content. If you are dealing with user-supplied markdown, we recommend running Blackfriday's output through HTML sanitizer such as Bluemonday.

Here's an example of simple usage of Blackfriday together with Bluemonday:

import (
    "github.com/microcosm-cc/bluemonday"
    "github.com/russross/blackfriday"
)

// ...
unsafe := blackfriday.Run(input)
html := bluemonday.UGCPolicy().SanitizeBytes(unsafe)

View on Github

4 - Mxj:

Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-path, including wildcards.

mxj supplants the legacy x2j and j2x packages. If you want the old syntax, use mxj/x2j and mxj/j2x packages.

Installation

Using go.mod:

go get github.com/clbanning/mxj/v2@v2.3.2
import "github.com/clbanning/mxj/v2"

... or just vendor the package.

Related Packages

https://github.com/clbanning/checkxml provides functions for validating XML data.

Refactor Encoder - 2020.05.01

Issue #70 highlighted that encoding large maps does not scale well, since the original logic used string appends operations. Using bytes.Buffer results in linear scaling for very large XML docs. (Metrics based on MacBook Pro i7 w/ 16 GB.)

Nodes      m.XML() time
54809       12.53708ms
109780      32.403183ms
164678      59.826412ms
482598     109.358007ms

Refactor Decoder - 2015.11.15

For over a year I've wanted to refactor the XML-to-map[string]interface{} decoder to make it more performant. I recently took the time to do that, since we were using github.com/clbanning/mxj in a production system that could be deployed on a Raspberry Pi. Now the decoder is comparable to the stdlib JSON-to-map[string]interface{} decoder in terms of its additional processing overhead relative to decoding to a structure value. As shown by:

BenchmarkNewMapXml-4         	  100000	     18043 ns/op
BenchmarkNewStructXml-4      	  100000	     14892 ns/op
BenchmarkNewMapJson-4        	  300000	      4633 ns/op
BenchmarkNewStructJson-4     	  300000	      3427 ns/op
BenchmarkNewMapXmlBooks-4    	   20000	     82850 ns/op
BenchmarkNewStructXmlBooks-4 	   20000	     67822 ns/op
BenchmarkNewMapJsonBooks-4   	  100000	     17222 ns/op
BenchmarkNewStructJsonBooks-4	  100000	     15309 ns/op

Notices

2021.02.02: v2.5 - add XmlCheckIsValid toggle to force checking that the encoded XML is valid
2020.12.14: v2.4 - add XMLEscapeCharsDecoder to preserve XML escaped characters in Map values
2020.10.28: v2.3 - add TrimWhiteSpace option
2020.05.01: v2.2 - optimize map to XML encoding for large XML docs.
2019.07.04: v2.0 - remove unnecessary methods - mv.XmlWriterRaw, mv.XmlIndentWriterRaw - for Map and MapSeq.
2019.07.04: Add MapSeq type and move associated functions and methods from Map to MapSeq.
2019.01.21: DecodeSimpleValuesAsMap - decode to map[<tag>:map["#text":<value>]] rather than map[<tag>:<value>]
2018.04.18: mv.Xml/mv.XmlIndent encodes non-map[string]interface{} map values - map[string]string, map[int]uint, etc.
2018.03.29: mv.Gob/NewMapGob support gob encoding/decoding of Maps.
2018.03.26: Added mxj/x2j-wrapper sub-package for migrating from legacy x2j package.
2017.02.22: LeafNode paths can use ".N" syntax rather than "[N]" for list member indexing.
2017.02.10: SetFieldSeparator changes field separator for args in UpdateValuesForPath, ValuesFor... methods.
2017.02.06: Support XMPP stream processing - HandleXMPPStreamTag().
2016.11.07: Preserve name space prefix syntax in XmlSeq parser - NewMapXmlSeq(), etc.
2016.06.25: Support overriding default XML attribute prefix, "-", in Map keys - SetAttrPrefix().
2016.05.26: Support customization of xml.Decoder by exposing CustomDecoder variable.
2016.03.19: Escape invalid chars when encoding XML attribute and element values - XMLEscapeChars().
2016.03.02: By default decoding XML with float64 and bool value casting will not cast "NaN", "Inf", and "-Inf".
            To cast them to float64, first set flag with CastNanInf(true).
2016.02.22: New mv.Root(), mv.Elements(), mv.Attributes methods let you examine XML document structure.
2016.02.16: Add CoerceKeysToLower() option to handle tags with mixed capitalization.
2016.02.12: Seek for first xml.StartElement token; only return error if io.EOF is reached first (handles BOM).
2015.12.02: XML decoding/encoding that preserves original structure of document. See NewMapXmlSeq()
            and mv.XmlSeq() / mv.XmlSeqIndent().
2015-05-20: New: mv.StringIndentNoTypeInfo().
            Also, alphabetically sort map[string]interface{} values by key to prettify output for mv.Xml(),
            mv.XmlIndent(), mv.StringIndent(), mv.StringIndentNoTypeInfo().
2014-11-09: IncludeTagSeqNum() adds "_seq" key with XML doc positional information.
            (NOTE: PreserveXmlList() is similar and will be here soon.)
2014-09-18: inspired by NYTimes fork, added PrependAttrWithHyphen() to allow stripping hyphen from attribute tag.
2014-08-02: AnyXml() and AnyXmlIndent() will try to marshal arbitrary values to XML.
2014-04-28: ValuesForPath() and NewMap() now accept path with indexed array references.

View on Github

5 - Go-output-format:

Output go structures into multiple formats (YAML/JSON/etc) in your command line app.

NOTE: V2 is a breaking compatibility change from V1. Going forward, only V2 will be developed and supported.

Helper utility to output data structures in to standardized formats, much like what is built in to vault, az and kubectl

I really like how these apps provide for flexible output, but wanted a way to do it without needing to re-write or copy it for every new tool.

Need to parse some output with jq? JSON is your format. Want to put it out in an easy to read yet still standardized format? YAML is for you!

This tool is intended to provide all that in a single reusable package.

Usage

Basic

Import with:

import "github.com/drewstinnett/go-output-format/v2/gout"

Example Usage:

import (
 "os"
 "github.com/drewstinnett/go-output-format/v2/gout"
 "github.com/drewstinnett/go-output-format/v2/formats/json"
)

func main() {
 w, err := gout.New()
 if err != nil {
  panic(err)
 }
 // By Default, the YAML format is use, Let's change it to json though
 w.SetFormatter(json.Formatter{})

 // By Default, print to stdout. Let's change it to stderr though
 w.SetWriter(os.Stderr)

 // Print it on out!
 w.Print(struct {
  FirstName string
  LastName  string
 }{
  FirstName: "Bob",
  LastName:  "Ross",
 })
 // {"FirstName":"Bob","LastName":"Ross"}
}

Cobra Integration

To simplify using this in new projects, you can use the NewWithCobraCmd method. Example:

// By default, look for a field called 'format'
w, err := NewWithCobraCmd(cmd, nil)
// Or pass a configuration object with what the field is called
w, err := NewWithCobraCmd(cmd, &gout.CobraCmdConfig{
        FormatField: "my-special-name-field",
})

By default, the gout will use os.Stdout as the default writer.

See _examples for more example usage

View on Github

6 - Go-toml:

Go library for the TOML format with query support and handy cli tools.

Import

import "github.com/pelletier/go-toml/v2"

See Modules.

Features

Stdlib behavior

As much as possible, this library is designed to behave similarly as the standard library's encoding/json.

Performance

While go-toml favors usability, it is written with performance in mind. Most operations should not be shockingly slow. See benchmarks.

Strict mode

Decoder can be set to "strict mode", which makes it error when some parts of the TOML document was not present in the target structure. This is a great way to check for typos. See example in the documentation.

Contextualized errors

When most decoding errors occur, go-toml returns DecodeError), which contains a human readable contextualized version of the error. For example:

2| key1 = "value1"
3| key2 = "missing2"
 | ~~~~ missing field
4| key3 = "missing3"
5| key4 = "value4"

Local date and time support

TOML supports native local date/times. It allows to represent a given date, time, or date-time without relation to a timezone or offset. To support this use-case, go-toml provides LocalDate, LocalTime, and LocalDateTime. Those types can be transformed to and from time.Time, making them convenient yet unambiguous structures for their respective TOML representation.

Getting started

Given the following struct, let's see how to read it and write it as TOML:

type MyConfig struct {
      Version int
      Name    string
      Tags    []string
}

Unmarshaling

Unmarshal reads a TOML document and fills a Go structure with its content. For example:

doc := `
version = 2
name = "go-toml"
tags = ["go", "toml"]
`

var cfg MyConfig
err := toml.Unmarshal([]byte(doc), &cfg)
if err != nil {
      panic(err)
}
fmt.Println("version:", cfg.Version)
fmt.Println("name:", cfg.Name)
fmt.Println("tags:", cfg.Tags)

// Output:
// version: 2
// name: go-toml
// tags: [go toml]

View on Github

7 - Goldmark:

A Markdown parser written in Go. Easy to extend, standard (CommonMark) compliant, well structured.

Features

  • Standards-compliant. goldmark is fully compliant with the latest CommonMark specification.
  • Extensible. Do you want to add a @username mention syntax to Markdown? You can easily do so in goldmark. You can add your AST nodes, parsers for block-level elements, parsers for inline-level elements, transformers for paragraphs, transformers for the whole AST structure, and renderers.
  • Performance. goldmark's performance is on par with that of cmark, the CommonMark reference implementation written in C.
  • Robust. goldmark is tested with go test --fuzz.
  • Built-in extensions. goldmark ships with common extensions like tables, strikethrough, task lists, and definition lists.
  • Depends only on standard libraries.

Installation

$ go get github.com/yuin/goldmark

Usage

Import packages:

import (
    "bytes"
    "github.com/yuin/goldmark"
)

Convert Markdown documents with the CommonMark-compliant mode:

var buf bytes.Buffer
if err := goldmark.Convert(source, &buf); err != nil {
  panic(err)
}

With options

var buf bytes.Buffer
if err := goldmark.Convert(source, &buf, parser.WithContext(ctx)); err != nil {
  panic(err)
}
Functional optionTypeDescription
parser.WithContextA parser.ContextContext for the parsing phase.

View on Github

8 - Goq:

Declarative unmarshaling of HTML using struct tags with jQuery syntax (uses GoQuery).

Example

import (
	"log"
	"net/http"

	"astuart.co/goq"
)

// Structured representation for github file name table
type example struct {
	Title string `goquery:"h1"`
	Files []string `goquery:"table.files tbody tr.js-navigation-item td.content,text"`
}

func main() {
	res, err := http.Get("https://github.com/andrewstuart/goq")
	if err != nil {
		log.Fatal(err)
	}
	defer res.Body.Close()

	var ex example
	
	err = goq.NewDecoder(res.Body).Decode(&ex)
	if err != nil {
		log.Fatal(err)
	}

	log.Println(ex.Title, ex.Files)
}

goq

-- import "astuart.co/goq"

Package goq was built to allow users to declaratively unmarshal HTML into go structs using struct tags composed of css selectors.

I've made a best effort to behave very similarly to JSON and XML decoding as well as exposing as much information as possible in the event of an error to help you debug your Unmarshaling issues.

When creating struct types to be unmarshaled into, the following general rules apply:

Any type that implements the Unmarshaler interface will be passed a slice of *html.Node so that manual unmarshaling may be done. This takes the highest precedence.

Any struct fields may be annotated with goquery metadata, which takes the form of an element selector followed by arbitrary comma-separated "value selectors."

A value selector may be one of html, text, or [someAttrName]. html and text will result in the methods of the same name being called on the *goquery.Selection to obtain the value. [someAttrName] will result in *goquery.Selection.Attr("someAttrName") being called for the value.

A primitive value type will default to the text value of the resulting nodes if no value selector is given.

At least one value selector is required for maps, to determine the map key. The key type must follow both the rules applicable to go map indexing, as well as these unmarshaling rules. The value of each key will be unmarshaled in the same way the element value is unmarshaled.

For maps, keys will be retreived from the same level of the DOM. The key selector may be arbitrarily nested, though. The first level of children with any number of matching elements will be used, though.

For maps, any values must be nested below the level of the key selector. Parents or siblings of the element matched by the key selector will not be considered.

Once used, a "value selector" will be shifted off of the comma-separated list. This allows you to nest arbitrary levels of value selectors. For example, the type []map[string][]string would require one selector for the map key, and take an optional second selector for the values of the string slice.

Any struct type encountered in nested types (e.g. map[string]SomeStruct) will override any remaining "value selectors" that had not been used. For example, given:

struct S { F string goquery:",[bang]" }

struct { T map[string]S goquery:"#someId,[foo],[bar],[baz]" }

[foo] will be used to determine the string map key,but [bar] and [baz] will be ignored, with the [bang] tag present S struct type taking precedence.

View on Github

9 - Html-to-markdown:

Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.

Installation

go get github.com/JohannesKaufmann/html-to-markdown

Usage

import (
	"fmt"
	"log"

	md "github.com/JohannesKaufmann/html-to-markdown"
)

converter := md.NewConverter("", true, nil)

html := `<strong>Important</strong>`

markdown, err := converter.ConvertString(html)
if err != nil {
  log.Fatal(err)
}
fmt.Println("md ->", markdown)

If you are already using goquery you can pass a selection to Convert.

markdown, err := converter.Convert(selec)

Using it on the command line

If you want to make use of html-to-markdown on the command line without any Go coding, check out html2md, a cli wrapper for html-to-markdown that has all the following options and plugins builtin.

Options

The third parameter to md.NewConverter is *md.Options.

For example you can change the character that is around a bold text ("**") to a different one (for example "__") by changing the value of StrongDelimiter.

opt := &md.Options{
  StrongDelimiter: "__", // default: **
  // ...
}
converter := md.NewConverter("", true, opt)

For all the possible options look at godocs and for a example look at the example.

Adding Rules

converter.AddRules(
  md.Rule{
    Filter: []string{"del", "s", "strike"},
    Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
      // You need to return a pointer to a string (md.String is just a helper function).
      // If you return nil the next function for that html element
      // will be picked. For example you could only convert an element
      // if it has a certain class name and fallback if not.
      content = strings.TrimSpace(content)
      return md.String("~" + content + "~")
    },
  },
  // more rules
)

For more information have a look at the example add_rules.

View on Github

10 - Htmlquery:

An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

Overview

htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.

htmlquery built-in the query object caching feature based on LRU, this feature will caching the recently used XPATH query string. Enable query caching can avoid re-compile XPath expression each query.

You can visit this page to learn about the supported XPath(1.0/2.0) syntax. https://github.com/antchfx/xpath

Installation

go get github.com/antchfx/htmlquery

Getting Started

Query, returns matched elements or error.

nodes, err := htmlquery.QueryAll(doc, "//a")
if err != nil {
	panic(`not a valid XPath expression.`)
}

Load HTML document from URL.

doc, err := htmlquery.LoadURL("http://example.com/")

Load HTML from document.

filePath := "/home/user/sample.html"
doc, err := htmlquery.LoadDoc(filePath)

Load HTML document from string.

s := `<html>....</html>`
doc, err := htmlquery.Parse(strings.NewReader(s))

Find all A elements.

list := htmlquery.Find(doc, "//a")

Find all A elements that have href attribute.

list := htmlquery.Find(doc, "//a[@href]")	

Find all A elements with href attribute and only return href value.

list := htmlquery.Find(doc, "//a/@href")	
for _ , n := range list{
	fmt.Println(htmlquery.SelectAttr(n, "href")) // output @href value
}

Find the third A element.

a := htmlquery.FindOne(doc, "//a[3]")

Find children element (img) under A href and print the source

a := htmlquery.FindOne(doc, "//a")
img := htmlquery.FindOne(a, "//img")
fmt.Prinln(htmlquery.SelectAttr(img, "src")) // output @src value

Evaluate the number of all IMG element.

expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)

View on Github

Thank you for following this article.

Related videos:

Golang Web Frameworks You MUST Learn (2022)

#go #golang #html #language 

Hire Dedicated Golang Developers | Golang Web Development Company

Does your business need a robust system across large-scale network servers then developing your app with a Golang programming language is the way to go. Golang is generally used for the development of highly secured, High Speed and High Modularity apps such as a FinTech Industry.

Want to develop a Highly secured app for your business?

Then hire a dedicated Golang developer from WebClues Infotech that are highly skilled in carrying out the work in a timely and qualitative output. With WebClues Infotech you get the assurance that we know what are the customers’ expectations and how to deliver on them on time.

Get your desired Golang Developer based on your project requirement!!

Share your requirements here https://www.webcluesinfotech.com/contact-us/

Book Free Interview with Golang developer: https://bit.ly/3dDShFg

#hire golang developer #hire go language developer #dedicated golang app developers #golang web development company #hire golang developers india #hire expert golang developers

Golang Web Development:Th Best Programming Language in 2020

https://www.mobinius.com/blogs/golang-web-development-company

#golang web development #golang-app-development-company #golang-development-solutions #hire-golang-developers #golang-development-services

How to Create Arrays in Python

In this tutorial, you'll know the basics of how to create arrays in Python using the array module. Learn how to use Python arrays. You'll see how to define them and the different methods commonly used for performing operations on them.

This tutorialvideo on 'Arrays in Python' will help you establish a strong hold on all the fundamentals in python programming language. Below are the topics covered in this video:  
1:15 What is an array?
2:53 Is python list same as an array?
3:48  How to create arrays in python?
7:19 Accessing array elements
9:59 Basic array operations
        - 10:33  Finding the length of an array
        - 11:44  Adding Elements
        - 15:06  Removing elements
        - 18:32  Array concatenation
       - 20:59  Slicing
       - 23:26  Looping  


Python Array Tutorial – Define, Index, Methods

In this article, you'll learn how to use Python arrays. You'll see how to define them and the different methods commonly used for performing operations on them.

The artcile covers arrays that you create by importing the array module. We won't cover NumPy arrays here.

Table of Contents

  1. Introduction to Arrays
    1. The differences between Lists and Arrays
    2. When to use arrays
  2. How to use arrays
    1. Define arrays
    2. Find the length of arrays
    3. Array indexing
    4. Search through arrays
    5. Loop through arrays
    6. Slice an array
  3. Array methods for performing operations
    1. Change an existing value
    2. Add a new value
    3. Remove a value
  4. Conclusion

Let's get started!

What are Python Arrays?

Arrays are a fundamental data structure, and an important part of most programming languages. In Python, they are containers which are able to store more than one item at the same time.

Specifically, they are an ordered collection of elements with every value being of the same data type. That is the most important thing to remember about Python arrays - the fact that they can only hold a sequence of multiple items that are of the same type.

What's the Difference between Python Lists and Python Arrays?

Lists are one of the most common data structures in Python, and a core part of the language.

Lists and arrays behave similarly.

Just like arrays, lists are an ordered sequence of elements.

They are also mutable and not fixed in size, which means they can grow and shrink throughout the life of the program. Items can be added and removed, making them very flexible to work with.

However, lists and arrays are not the same thing.

Lists store items that are of various data types. This means that a list can contain integers, floating point numbers, strings, or any other Python data type, at the same time. That is not the case with arrays.

As mentioned in the section above, arrays store only items that are of the same single data type. There are arrays that contain only integers, or only floating point numbers, or only any other Python data type you want to use.

When to Use Python Arrays

Lists are built into the Python programming language, whereas arrays aren't. Arrays are not a built-in data structure, and therefore need to be imported via the array module in order to be used.

Arrays of the array module are a thin wrapper over C arrays, and are useful when you want to work with homogeneous data.

They are also more compact and take up less memory and space which makes them more size efficient compared to lists.

If you want to perform mathematical calculations, then you should use NumPy arrays by importing the NumPy package. Besides that, you should just use Python arrays when you really need to, as lists work in a similar way and are more flexible to work with.

How to Use Arrays in Python

In order to create Python arrays, you'll first have to import the array module which contains all the necassary functions.

There are three ways you can import the array module:

  • By using import array at the top of the file. This includes the module array. You would then go on to create an array using array.array().
import array

#how you would create an array
array.array()
  • Instead of having to type array.array() all the time, you could use import array as arr at the top of the file, instead of import array alone. You would then create an array by typing arr.array(). The arr acts as an alias name, with the array constructor then immediately following it.
import array as arr

#how you would create an array
arr.array()
  • Lastly, you could also use from array import *, with * importing all the functionalities available. You would then create an array by writing the array() constructor alone.
from array import *

#how you would create an array
array()

How to Define Arrays in Python

Once you've imported the array module, you can then go on to define a Python array.

The general syntax for creating an array looks like this:

variable_name = array(typecode,[elements])

Let's break it down:

  • variable_name would be the name of the array.
  • The typecode specifies what kind of elements would be stored in the array. Whether it would be an array of integers, an array of floats or an array of any other Python data type. Remember that all elements should be of the same data type.
  • Inside square brackets you mention the elements that would be stored in the array, with each element being separated by a comma. You can also create an empty array by just writing variable_name = array(typecode) alone, without any elements.

Below is a typecode table, with the different typecodes that can be used with the different data types when defining Python arrays:

TYPECODEC TYPEPYTHON TYPESIZE
'b'signed charint1
'B'unsigned charint1
'u'wchar_tUnicode character2
'h'signed shortint2
'H'unsigned shortint2
'i'signed intint2
'I'unsigned intint2
'l'signed longint4
'L'unsigned longint4
'q'signed long longint8
'Q'unsigned long longint8
'f'floatfloat4
'd'doublefloat8

Tying everything together, here is an example of how you would define an array in Python:

import array as arr 

numbers = arr.array('i',[10,20,30])


print(numbers)

#output

#array('i', [10, 20, 30])

Let's break it down:

  • First we included the array module, in this case with import array as arr .
  • Then, we created a numbers array.
  • We used arr.array() because of import array as arr .
  • Inside the array() constructor, we first included i, for signed integer. Signed integer means that the array can include positive and negative values. Unsigned integer, with H for example, would mean that no negative values are allowed.
  • Lastly, we included the values to be stored in the array in square brackets.

Keep in mind that if you tried to include values that were not of i typecode, meaning they were not integer values, you would get an error:

import array as arr 

numbers = arr.array('i',[10.0,20,30])


print(numbers)

#output

#Traceback (most recent call last):
# File "/Users/dionysialemonaki/python_articles/demo.py", line 14, in <module>
#   numbers = arr.array('i',[10.0,20,30])
#TypeError: 'float' object cannot be interpreted as an integer

In the example above, I tried to include a floating point number in the array. I got an error because this is meant to be an integer array only.

Another way to create an array is the following:

from array import *

#an array of floating point values
numbers = array('d',[10.0,20.0,30.0])

print(numbers)

#output

#array('d', [10.0, 20.0, 30.0])

The example above imported the array module via from array import * and created an array numbers of float data type. This means that it holds only floating point numbers, which is specified with the 'd' typecode.

How to Find the Length of an Array in Python

To find out the exact number of elements contained in an array, use the built-in len() method.

It will return the integer number that is equal to the total number of elements in the array you specify.

import array as arr 

numbers = arr.array('i',[10,20,30])


print(len(numbers))

#output
# 3

In the example above, the array contained three elements – 10, 20, 30 – so the length of numbers is 3.

Array Indexing and How to Access Individual Items in an Array in Python

Each item in an array has a specific address. Individual items are accessed by referencing their index number.

Indexing in Python, and in all programming languages and computing in general, starts at 0. It is important to remember that counting starts at 0 and not at 1.

To access an element, you first write the name of the array followed by square brackets. Inside the square brackets you include the item's index number.

The general syntax would look something like this:

array_name[index_value_of_item]

Here is how you would access each individual element in an array:

import array as arr 

numbers = arr.array('i',[10,20,30])

print(numbers[0]) # gets the 1st element
print(numbers[1]) # gets the 2nd element
print(numbers[2]) # gets the 3rd element

#output

#10
#20
#30

Remember that the index value of the last element of an array is always one less than the length of the array. Where n is the length of the array, n - 1 will be the index value of the last item.

Note that you can also access each individual element using negative indexing.

With negative indexing, the last element would have an index of -1, the second to last element would have an index of -2, and so on.

Here is how you would get each item in an array using that method:

import array as arr 

numbers = arr.array('i',[10,20,30])

print(numbers[-1]) #gets last item
print(numbers[-2]) #gets second to last item
print(numbers[-3]) #gets first item
 
#output

#30
#20
#10

How to Search Through an Array in Python

You can find out an element's index number by using the index() method.

You pass the value of the element being searched as the argument to the method, and the element's index number is returned.

import array as arr 

numbers = arr.array('i',[10,20,30])

#search for the index of the value 10
print(numbers.index(10))

#output

#0

If there is more than one element with the same value, the index of the first instance of the value will be returned:

import array as arr 


numbers = arr.array('i',[10,20,30,10,20,30])

#search for the index of the value 10
#will return the index number of the first instance of the value 10
print(numbers.index(10))

#output

#0

How to Loop through an Array in Python

You've seen how to access each individual element in an array and print it out on its own.

You've also seen how to print the array, using the print() method. That method gives the following result:

import array as arr 

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])

What if you want to print each value one by one?

This is where a loop comes in handy. You can loop through the array and print out each value, one-by-one, with each loop iteration.

For this you can use a simple for loop:

import array as arr 

numbers = arr.array('i',[10,20,30])

for number in numbers:
    print(number)
    
#output
#10
#20
#30

You could also use the range() function, and pass the len() method as its parameter. This would give the same result as above:

import array as arr  

values = arr.array('i',[10,20,30])

#prints each individual value in the array
for value in range(len(values)):
    print(values[value])

#output

#10
#20
#30

How to Slice an Array in Python

To access a specific range of values inside the array, use the slicing operator, which is a colon :.

When using the slicing operator and you only include one value, the counting starts from 0 by default. It gets the first item, and goes up to but not including the index number you specify.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#get the values 10 and 20 only
print(numbers[:2])  #first to second position

#output

#array('i', [10, 20])

When you pass two numbers as arguments, you specify a range of numbers. In this case, the counting starts at the position of the first number in the range, and up to but not including the second one:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])


#get the values 20 and 30 only
print(numbers[1:3]) #second to third position

#output

#rray('i', [20, 30])

Methods For Performing Operations on Arrays in Python

Arrays are mutable, which means they are changeable. You can change the value of the different items, add new ones, or remove any you don't want in your program anymore.

Let's see some of the most commonly used methods which are used for performing operations on arrays.

How to Change the Value of an Item in an Array

You can change the value of a specific element by speficying its position and assigning it a new value:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#change the first element
#change it from having a value of 10 to having a value of 40
numbers[0] = 40

print(numbers)

#output

#array('i', [40, 20, 30])

How to Add a New Value to an Array

To add one single value at the end of an array, use the append() method:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40)

print(numbers)

#output

#array('i', [10, 20, 30, 40])

Be aware that the new item you add needs to be the same data type as the rest of the items in the array.

Look what happens when I try to add a float to an array of integers:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40.0)

print(numbers)

#output

#Traceback (most recent call last):
#  File "/Users/dionysialemonaki/python_articles/demo.py", line 19, in <module>
#   numbers.append(40.0)
#TypeError: 'float' object cannot be interpreted as an integer

But what if you want to add more than one value to the end an array?

Use the extend() method, which takes an iterable (such as a list of items) as an argument. Again, make sure that the new items are all the same data type.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integers 40,50,60 to the end of numbers
#The numbers need to be enclosed in square brackets

numbers.extend([40,50,60])

print(numbers)

#output

#array('i', [10, 20, 30, 40, 50, 60])

And what if you don't want to add an item to the end of an array? Use the insert() method, to add an item at a specific position.

The insert() function takes two arguments: the index number of the position the new element will be inserted, and the value of the new element.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 in the first position
#remember indexing starts at 0

numbers.insert(0,40)

print(numbers)

#output

#array('i', [40, 10, 20, 30])

How to Remove a Value from an Array

To remove an element from an array, use the remove() method and include the value as an argument to the method.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30])

With remove(), only the first instance of the value you pass as an argument will be removed.

See what happens when there are more than one identical values:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30,10,20])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30, 10, 20])

Only the first occurence of 10 is removed.

You can also use the pop() method, and specify the position of the element to be removed:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30,10,20])

#remove the first instance of 10
numbers.pop(0)

print(numbers)

#output

#array('i', [20, 30, 10, 20])

Conclusion

And there you have it - you now know the basics of how to create arrays in Python using the array module. Hopefully you found this guide helpful.

Thanks for reading and happy coding!

#python #programming 

Connor Mills

Connor Mills

1670560264

Understanding Arrays in Python

Learn how to use Python arrays. Create arrays in Python using the array module. You'll see how to define them and the different methods commonly used for performing operations on them.
 

The artcile covers arrays that you create by importing the array module. We won't cover NumPy arrays here.

Table of Contents

  1. Introduction to Arrays
    1. The differences between Lists and Arrays
    2. When to use arrays
  2. How to use arrays
    1. Define arrays
    2. Find the length of arrays
    3. Array indexing
    4. Search through arrays
    5. Loop through arrays
    6. Slice an array
  3. Array methods for performing operations
    1. Change an existing value
    2. Add a new value
    3. Remove a value
  4. Conclusion

Let's get started!


What are Python Arrays?

Arrays are a fundamental data structure, and an important part of most programming languages. In Python, they are containers which are able to store more than one item at the same time.

Specifically, they are an ordered collection of elements with every value being of the same data type. That is the most important thing to remember about Python arrays - the fact that they can only hold a sequence of multiple items that are of the same type.

What's the Difference between Python Lists and Python Arrays?

Lists are one of the most common data structures in Python, and a core part of the language.

Lists and arrays behave similarly.

Just like arrays, lists are an ordered sequence of elements.

They are also mutable and not fixed in size, which means they can grow and shrink throughout the life of the program. Items can be added and removed, making them very flexible to work with.

However, lists and arrays are not the same thing.

Lists store items that are of various data types. This means that a list can contain integers, floating point numbers, strings, or any other Python data type, at the same time. That is not the case with arrays.

As mentioned in the section above, arrays store only items that are of the same single data type. There are arrays that contain only integers, or only floating point numbers, or only any other Python data type you want to use.

When to Use Python Arrays

Lists are built into the Python programming language, whereas arrays aren't. Arrays are not a built-in data structure, and therefore need to be imported via the array module in order to be used.

Arrays of the array module are a thin wrapper over C arrays, and are useful when you want to work with homogeneous data.

They are also more compact and take up less memory and space which makes them more size efficient compared to lists.

If you want to perform mathematical calculations, then you should use NumPy arrays by importing the NumPy package. Besides that, you should just use Python arrays when you really need to, as lists work in a similar way and are more flexible to work with.

How to Use Arrays in Python

In order to create Python arrays, you'll first have to import the array module which contains all the necassary functions.

There are three ways you can import the array module:

  1. By using import array at the top of the file. This includes the module array. You would then go on to create an array using array.array().
import array

#how you would create an array
array.array()
  1. Instead of having to type array.array() all the time, you could use import array as arr at the top of the file, instead of import array alone. You would then create an array by typing arr.array(). The arr acts as an alias name, with the array constructor then immediately following it.
import array as arr

#how you would create an array
arr.array()
  1. Lastly, you could also use from array import *, with * importing all the functionalities available. You would then create an array by writing the array() constructor alone.
from array import *

#how you would create an array
array()

How to Define Arrays in Python

Once you've imported the array module, you can then go on to define a Python array.

The general syntax for creating an array looks like this:

variable_name = array(typecode,[elements])

Let's break it down:

  • variable_name would be the name of the array.
  • The typecode specifies what kind of elements would be stored in the array. Whether it would be an array of integers, an array of floats or an array of any other Python data type. Remember that all elements should be of the same data type.
  • Inside square brackets you mention the elements that would be stored in the array, with each element being separated by a comma. You can also create an empty array by just writing variable_name = array(typecode) alone, without any elements.

Below is a typecode table, with the different typecodes that can be used with the different data types when defining Python arrays:

TYPECODEC TYPEPYTHON TYPESIZE
'b'signed charint1
'B'unsigned charint1
'u'wchar_tUnicode character2
'h'signed shortint2
'H'unsigned shortint2
'i'signed intint2
'I'unsigned intint2
'l'signed longint4
'L'unsigned longint4
'q'signed long longint8
'Q'unsigned long longint8
'f'floatfloat4
'd'doublefloat8

Tying everything together, here is an example of how you would define an array in Python:

import array as arr 

numbers = arr.array('i',[10,20,30])


print(numbers)

#output

#array('i', [10, 20, 30])

Let's break it down:

  • First we included the array module, in this case with import array as arr .
  • Then, we created a numbers array.
  • We used arr.array() because of import array as arr .
  • Inside the array() constructor, we first included i, for signed integer. Signed integer means that the array can include positive and negative values. Unsigned integer, with H for example, would mean that no negative values are allowed.
  • Lastly, we included the values to be stored in the array in square brackets.

Keep in mind that if you tried to include values that were not of i typecode, meaning they were not integer values, you would get an error:

import array as arr 

numbers = arr.array('i',[10.0,20,30])


print(numbers)

#output

#Traceback (most recent call last):
# File "/Users/dionysialemonaki/python_articles/demo.py", line 14, in <module>
#   numbers = arr.array('i',[10.0,20,30])
#TypeError: 'float' object cannot be interpreted as an integer

In the example above, I tried to include a floating point number in the array. I got an error because this is meant to be an integer array only.

Another way to create an array is the following:

from array import *

#an array of floating point values
numbers = array('d',[10.0,20.0,30.0])

print(numbers)

#output

#array('d', [10.0, 20.0, 30.0])

The example above imported the array module via from array import * and created an array numbers of float data type. This means that it holds only floating point numbers, which is specified with the 'd' typecode.

How to Find the Length of an Array in Python

To find out the exact number of elements contained in an array, use the built-in len() method.

It will return the integer number that is equal to the total number of elements in the array you specify.

import array as arr 

numbers = arr.array('i',[10,20,30])


print(len(numbers))

#output
# 3

In the example above, the array contained three elements – 10, 20, 30 – so the length of numbers is 3.

Array Indexing and How to Access Individual Items in an Array in Python

Each item in an array has a specific address. Individual items are accessed by referencing their index number.

Indexing in Python, and in all programming languages and computing in general, starts at 0. It is important to remember that counting starts at 0 and not at 1.

To access an element, you first write the name of the array followed by square brackets. Inside the square brackets you include the item's index number.

The general syntax would look something like this:

array_name[index_value_of_item]

Here is how you would access each individual element in an array:

import array as arr 

numbers = arr.array('i',[10,20,30])

print(numbers[0]) # gets the 1st element
print(numbers[1]) # gets the 2nd element
print(numbers[2]) # gets the 3rd element

#output

#10
#20
#30

Remember that the index value of the last element of an array is always one less than the length of the array. Where n is the length of the array, n - 1 will be the index value of the last item.

Note that you can also access each individual element using negative indexing.

With negative indexing, the last element would have an index of -1, the second to last element would have an index of -2, and so on.

Here is how you would get each item in an array using that method:

import array as arr 

numbers = arr.array('i',[10,20,30])

print(numbers[-1]) #gets last item
print(numbers[-2]) #gets second to last item
print(numbers[-3]) #gets first item
 
#output

#30
#20
#10

How to Search Through an Array in Python

You can find out an element's index number by using the index() method.

You pass the value of the element being searched as the argument to the method, and the element's index number is returned.

import array as arr 

numbers = arr.array('i',[10,20,30])

#search for the index of the value 10
print(numbers.index(10))

#output

#0

If there is more than one element with the same value, the index of the first instance of the value will be returned:

import array as arr 


numbers = arr.array('i',[10,20,30,10,20,30])

#search for the index of the value 10
#will return the index number of the first instance of the value 10
print(numbers.index(10))

#output

#0

How to Loop through an Array in Python

You've seen how to access each individual element in an array and print it out on its own.

You've also seen how to print the array, using the print() method. That method gives the following result:

import array as arr 

numbers = arr.array('i',[10,20,30])

print(numbers)

#output

#array('i', [10, 20, 30])

What if you want to print each value one by one?

This is where a loop comes in handy. You can loop through the array and print out each value, one-by-one, with each loop iteration.

For this you can use a simple for loop:

import array as arr 

numbers = arr.array('i',[10,20,30])

for number in numbers:
    print(number)
    
#output
#10
#20
#30

You could also use the range() function, and pass the len() method as its parameter. This would give the same result as above:

import array as arr  

values = arr.array('i',[10,20,30])

#prints each individual value in the array
for value in range(len(values)):
    print(values[value])

#output

#10
#20
#30

How to Slice an Array in Python

To access a specific range of values inside the array, use the slicing operator, which is a colon :.

When using the slicing operator and you only include one value, the counting starts from 0 by default. It gets the first item, and goes up to but not including the index number you specify.


import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#get the values 10 and 20 only
print(numbers[:2])  #first to second position

#output

#array('i', [10, 20])

When you pass two numbers as arguments, you specify a range of numbers. In this case, the counting starts at the position of the first number in the range, and up to but not including the second one:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])


#get the values 20 and 30 only
print(numbers[1:3]) #second to third position

#output

#rray('i', [20, 30])

Methods For Performing Operations on Arrays in Python

Arrays are mutable, which means they are changeable. You can change the value of the different items, add new ones, or remove any you don't want in your program anymore.

Let's see some of the most commonly used methods which are used for performing operations on arrays.

How to Change the Value of an Item in an Array

You can change the value of a specific element by speficying its position and assigning it a new value:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#change the first element
#change it from having a value of 10 to having a value of 40
numbers[0] = 40

print(numbers)

#output

#array('i', [40, 20, 30])

How to Add a New Value to an Array

To add one single value at the end of an array, use the append() method:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40)

print(numbers)

#output

#array('i', [10, 20, 30, 40])

Be aware that the new item you add needs to be the same data type as the rest of the items in the array.

Look what happens when I try to add a float to an array of integers:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 to the end of numbers
numbers.append(40.0)

print(numbers)

#output

#Traceback (most recent call last):
#  File "/Users/dionysialemonaki/python_articles/demo.py", line 19, in <module>
#   numbers.append(40.0)
#TypeError: 'float' object cannot be interpreted as an integer

But what if you want to add more than one value to the end an array?

Use the extend() method, which takes an iterable (such as a list of items) as an argument. Again, make sure that the new items are all the same data type.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integers 40,50,60 to the end of numbers
#The numbers need to be enclosed in square brackets

numbers.extend([40,50,60])

print(numbers)

#output

#array('i', [10, 20, 30, 40, 50, 60])

And what if you don't want to add an item to the end of an array? Use the insert() method, to add an item at a specific position.

The insert() function takes two arguments: the index number of the position the new element will be inserted, and the value of the new element.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

#add the integer 40 in the first position
#remember indexing starts at 0

numbers.insert(0,40)

print(numbers)

#output

#array('i', [40, 10, 20, 30])

How to Remove a Value from an Array

To remove an element from an array, use the remove() method and include the value as an argument to the method.

import array as arr 

#original array
numbers = arr.array('i',[10,20,30])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30])

With remove(), only the first instance of the value you pass as an argument will be removed.

See what happens when there are more than one identical values:


import array as arr 

#original array
numbers = arr.array('i',[10,20,30,10,20])

numbers.remove(10)

print(numbers)

#output

#array('i', [20, 30, 10, 20])

Only the first occurence of 10 is removed.

You can also use the pop() method, and specify the position of the element to be removed:

import array as arr 

#original array
numbers = arr.array('i',[10,20,30,10,20])

#remove the first instance of 10
numbers.pop(0)

print(numbers)

#output

#array('i', [20, 30, 10, 20])

Conclusion

And there you have it - you now know the basics of how to create arrays in Python using the array module. Hopefully you found this guide helpful.

You'll start from the basics and learn in an interacitve and beginner-friendly way. You'll also build five projects at the end to put into practice and help reinforce what you learned.

Thanks for reading and happy coding!

Original article source at https://www.freecodecamp.org

#python