1575974646
To check if a string is empty or null or undefined in Javascript use the following js code snippet.
var emptyString;
if(!emptyString){
// String is empty
}
In JavaScript, if the string is empty or null or undefined IF condition returns false.
var undefinedString;
if(!undefinedString){
console.log("string is undefined");
}
var emptyString="";
if(!emptyString){
console.log("string is empty");
}
var nullString=null;
if(!nullString){
console.log("string is null");
}
If the string is empty/null/undefined if condition can return false:
If the string is not undefined or null and if you want to check for empty string in Javascript we can use the length property of the string prototype as shown below.
var emptyString="";
if(emptyString && emptyString.length==0){
console.log("string is empty");
}
Or you can directly check for the empty string with javascript Comparison operator “===” as shown below.
if(emptyString===""){
console.log("string is empty");
}
Thank you for reading !
#JavaScript #String
1650512040
The SheetJS Community Edition offers battle-tested open-source solutions for extracting useful data from almost any complex spreadsheet and generating new spreadsheets that will work with legacy and modern software alike.
SheetJS Pro offers solutions beyond data processing: Edit complex templates with ease; let out your inner Picasso with styling; make custom sheets with images/graphs/PivotTables; evaluate formula expressions and port calculations to web apps; automate common spreadsheet tasks, and much more!
Each standalone release script is available at https://cdn.sheetjs.com/.
The current version is 0.18.6
and can be referenced as follows:
<!-- use version 0.18.6 -->
<script lang="javascript" src="https://cdn.sheetjs.com/xlsx-0.18.6/package/dist/xlsx.full.min.js"></script>
The latest
tag references the latest version and updates with each release:
<!-- use the latest version -->
<script lang="javascript" src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
For production use, scripts should be downloaded and added to a public folder alongside other scripts.
Browser builds (click to show)
The complete single-file version is generated at dist/xlsx.full.min.js
dist/xlsx.core.min.js
omits codepage library (no support for XLS encodings)
A slimmer build is generated at dist/xlsx.mini.min.js
. Compared to full build:
These scripts are also available on the CDN:
<!-- use xlsx.mini.min.js from version 0.18.6 -->
<script lang="javascript" src="https://cdn.sheetjs.com/xlsx-0.18.6/package/dist/xlsx.mini.min.js"></script>
Bower will pull the entire repo:
$ bower install js-xlsx
Bower will place the standalone scripts in bower_components/js-xlsx/dist/
Internet Explorer and ECMAScript 3 Compatibility (click to show)
For broad compatibility with JavaScript engines, the library is written using ECMAScript 3 language dialect as well as some ES5 features like Array#forEach
. Older browsers require shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js
:
<!-- add the shim first -->
<script type="text/javascript" src="shim.min.js"></script>
<!-- after the shim is referenced, add the library -->
<script type="text/javascript" src="xlsx.full.min.js"></script>
Due to SSL certificate compatibility issues, it is highly recommended to save the Standalone and Shim scripts from https://cdn.sheetjs.com/ and add to a public directory in the site.
The script also includes IE_LoadFile
and IE_SaveFile
for loading and saving files in Internet Explorer versions 6-9. The xlsx.extendscript.js
script bundles the shim in a format suitable for Photoshop and other Adobe products.
Browser ESM
The ECMAScript Module build is saved to xlsx.mjs
and can be directly added to a page with a script
tag using type="module"
:
<script type="module">
import { read, writeFileXLSX } from "https://cdn.sheetjs.com/xlsx-0.18.6/package/xlsx.mjs";
/* load the codepage support library for extended support with older formats */
import { set_cptable } from "https://cdn.sheetjs.com/xlsx-0.18.6/package/xlsx.mjs";
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.18.6/package/dist/cpexcel.full.mjs';
set_cptable(cptable);
</script>
Frameworks (Angular, VueJS, React) and Bundlers (webpack, etc)
The NodeJS package is readily installed from the tarballs:
$ npm install --save https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz # npm
$ pnpm install --save https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz # pnpm
$ yarn add --save https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz # yarn
Once installed, the library can be imported under the name xlsx
:
import { read, writeFileXLSX } from "xlsx";
/* load the codepage support library for extended support with older formats */
import { set_cptable } from "xlsx";
import * as cptable from 'xlsx/dist/cpexcel.full.mjs';
set_cptable(cptable);
xlsx.mjs
can be imported in Deno:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.18.6/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.18.6/package/xlsx.mjs';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.18.6/package/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
Tarballs are available on https://cdn.sheetjs.com.
Each individual version can be referenced using a similar URL pattern. https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz is the URL for 0.18.6
https://cdn.sheetjs.com/xlsx-latest/xlsx-latest.tgz is a link to the latest version and will refresh on each release.
Installation
Tarballs can be directly installed using a package manager:
$ npm install https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz # npm
$ pnpm install https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz # pnpm
$ yarn add https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz # yarn
For general stability, "vendoring" modules is the recommended approach:
Download the tarball (xlsx-0.18.6.tgz
) for the desired version. The current version is available at https://cdn.sheetjs.com/xlsx-0.18.6/xlsx-0.18.6.tgz
Create a vendor
subdirectory at the root of your project and move the tarball to that folder. Add it to your project repository.
Install the tarball using a package manager:
$ npm install --save file:vendor/xlsx-0.18.6.tgz # npm
$ pnpm install --save file:vendor/xlsx-0.18.6.tgz # pnpm
$ yarn add file:vendor/xlsx-0.18.6.tgz # yarn
The package will be installed and accessible as xlsx
.
Usage
By default, the module supports require
and it will automatically add support for streams and filesystem access:
var XLSX = require("xlsx");
The module also ships with xlsx.mjs
for use with import
. The mjs
version does not automatically load native node modules:
import * as XLSX from 'xlsx/xlsx.mjs';
/* load 'fs' for readFile and writeFile support */
import * as fs from 'fs';
XLSX.set_fs(fs);
/* load 'stream' for stream support */
import { Readable } from 'stream';
XLSX.stream.set_readable(Readable);
/* load the codepage support library for extended support with older formats */
import * as cpexcel from 'xlsx/dist/cpexcel.full.mjs';
XLSX.set_cptable(cpexcel);
dist/xlsx.extendscript.js
is an ExtendScript build for Photoshop and InDesign. https://cdn.sheetjs.com/xlsx-0.18.6/package/dist/xlsx.extendscript.js is the current version. After downloading the script, it can be directly referenced with a #include
directive:
#include "xlsx.extendscript.js"
Most scenarios involving spreadsheets and data can be broken into 5 parts:
Acquire Data: Data may be stored anywhere: local or remote files, databases, HTML TABLE, or even generated programmatically in the web browser.
Extract Data: For spreadsheet files, this involves parsing raw bytes to read the cell data. For general JS data, this involves reshaping the data.
Process Data: From generating summary statistics to cleaning data records, this step is the heart of the problem.
Package Data: This can involve making a new spreadsheet or serializing with JSON.stringify
or writing XML or simply flattening data for UI tools.
Release Data: Spreadsheet files can be uploaded to a server or written locally. Data can be presented to users in an HTML TABLE or data grid.
A common problem involves generating a valid spreadsheet export from data stored in an HTML table. In this example, an HTML TABLE on the page will be scraped, a row will be added to the bottom with the date of the report, and a new file will be generated and downloaded locally. XLSX.writeFile
takes care of packaging the data and attempting a local download:
// Acquire Data (reference to the HTML table)
var table_elt = document.getElementById("my-table-id");
// Extract Data (create a workbook object from the table)
var workbook = XLSX.utils.table_to_book(table_elt);
// Process Data (add a new row)
var ws = workbook.Sheets["Sheet1"];
XLSX.utils.sheet_add_aoa(ws, [["Created "+new Date().toISOString()]], {origin:-1});
// Package and Release Data (`writeFile` tries to write and save an XLSB file)
XLSX.writeFile(workbook, "Report.xlsb");
This library tries to simplify steps 2 and 4 with functions to extract useful data from spreadsheet files (read
/ readFile
) and generate new spreadsheet files from data (write
/ writeFile
). Additional utility functions like table_to_book
work with other common data sources like HTML tables.
This documentation and various demo projects cover a number of common scenarios and approaches for steps 1 and 5.
Utility functions help with step 3.
"Acquiring and Extracting Data" describes solutions for common data import scenarios.
"Packaging and Releasing Data" describes solutions for common data export scenarios.
"Processing Data" describes solutions for common workbook processing and manipulation scenarios.
"Utility Functions" details utility functions for translating JSON Arrays and other common JS structures into worksheet objects.
Data processing should fit in any workflow
The library does not impose a separate lifecycle. It fits nicely in websites and apps built using any framework. The plain JS data objects play nice with Web Workers and future APIs.
JavaScript is a powerful language for data processing
The "Common Spreadsheet Format" is a simple object representation of the core concepts of a workbook. The various functions in the library provide low-level tools for working with the object.
For friendly JS processing, there are utility functions for converting parts of a worksheet to/from an Array of Arrays. The following example combines powerful JS Array methods with a network request library to download data, select the information we want and create a workbook file:
Get Data from a JSON Endpoint and Generate a Workbook (click to show)
The goal is to generate a XLSB workbook of US President names and birthdays.
Acquire Data
Raw Data
https://theunitedstates.io/congress-legislators/executive.json has the desired data. For example, John Adams:
{
"id": { /* (data omitted) */ },
"name": {
"first": "John", // <-- first name
"last": "Adams" // <-- last name
},
"bio": {
"birthday": "1735-10-19", // <-- birthday
"gender": "M"
},
"terms": [
{ "type": "viceprez", /* (other fields omitted) */ },
{ "type": "viceprez", /* (other fields omitted) */ },
{ "type": "prez", /* (other fields omitted) */ } // <-- look for "prez"
]
}
Filtering for Presidents
The dataset includes Aaron Burr, a Vice President who was never President!
Array#filter
creates a new array with the desired rows. A President served at least one term with type
set to "prez"
. To test if a particular row has at least one "prez"
term, Array#some
is another native JS function. The complete filter would be:
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
Lining up the data
For this example, the name will be the first name combined with the last name (row.name.first + " " + row.name.last
) and the birthday will be the subfield row.bio.birthday
. Using Array#map
, the dataset can be massaged in one call:
const rows = prez.map(row => ({
name: row.name.first + " " + row.name.last,
birthday: row.bio.birthday
}));
The result is an array of "simple" objects with no nesting:
[
{ name: "George Washington", birthday: "1732-02-22" },
{ name: "John Adams", birthday: "1735-10-19" },
// ... one row per President
]
Extract Data
With the cleaned dataset, XLSX.utils.json_to_sheet
generates a worksheet:
const worksheet = XLSX.utils.json_to_sheet(rows);
XLSX.utils.book_new
creates a new workbook and XLSX.utils.book_append_sheet
appends a worksheet to the workbook. The new worksheet will be called "Dates":
const workbook = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
Process Data
Fixing headers
By default, json_to_sheet
creates a worksheet with a header row. In this case, the headers come from the JS object keys: "name" and "birthday".
The headers are in cells A1 and B1. XLSX.utils.sheet_add_aoa
can write text values to the existing worksheet starting at cell A1:
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
Fixing Column Widths
Some of the names are longer than the default column width. Column widths are set by setting the "!cols"
worksheet property.
The following line sets the width of column A to approximately 10 characters:
worksheet["!cols"] = [ { wch: 10 } ]; // set column A width to 10 characters
One Array#reduce
call over rows
can calculate the maximum width:
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
worksheet["!cols"] = [ { wch: max_width } ];
Note: If the starting point was a file or HTML table, XLSX.utils.sheet_to_json
will generate an array of JS objects.
Package and Release Data
XLSX.writeFile
creates a spreadsheet file and tries to write it to the system. In the browser, it will try to prompt the user to download the file. In NodeJS, it will write to the local directory.
XLSX.writeFile(workbook, "Presidents.xlsx");
Complete Example
// Uncomment the next line for use in NodeJS:
// const XLSX = require("xlsx"), axios = require("axios");
(async() => {
/* fetch JSON data and parse */
const url = "https://theunitedstates.io/congress-legislators/executive.json";
const raw_data = (await axios(url, {responseType: "json"})).data;
/* filter for the Presidents */
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
/* flatten objects */
const rows = prez.map(row => ({
name: row.name.first + " " + row.name.last,
birthday: row.bio.birthday
}));
/* generate worksheet and workbook */
const worksheet = XLSX.utils.json_to_sheet(rows);
const workbook = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
/* fix headers */
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
/* calculate column width */
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
worksheet["!cols"] = [ { wch: max_width } ];
/* create an XLSX file and try to save to Presidents.xlsx */
XLSX.writeFile(workbook, "Presidents.xlsx");
})();
For use in the web browser, assuming the snippet is saved to snippet.js
, script tags should be used to include the axios
and xlsx
standalone builds:
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
<script src="snippet.js"></script>
File formats are implementation details
The parser covers a wide gamut of common spreadsheet file formats to ensure that "HTML-saved-as-XLS" files work as well as actual XLS or XLSX files.
The writer supports a number of common output formats for broad compatibility with the data ecosystem.
To the greatest extent possible, data processing code should not have to worry about the specific file formats involved.
The demos
directory includes sample projects for:
Frameworks and APIs
angularjs
angular and ionic
knockout
meteor
react, react-native, next
vue 2.x, weex, nuxt
XMLHttpRequest and fetch
nodejs server
databases and key/value stores
typed arrays and math
Bundlers and Tooling
Platforms and Integrations
deno
electron application
nw.js application
Chrome / Chromium extensions
Download a Google Sheet locally
Adobe ExtendScript
Headless Browsers
canvas-datagrid
x-spreadsheet
react-data-grid
vue3-table-light
Swift JSC and other engines
"serverless" functions
internet explorer
Other examples are included in the showcase.
https://sheetjs.com/demos/modify.html shows a complete example of reading, modifying, and writing files.
https://github.com/SheetJS/sheetjs/blob/HEAD/bin/xlsx.njs is the command-line tool included with node installations, reading spreadsheet files and exporting the contents in various formats.
API
Extract data from spreadsheet bytes
var workbook = XLSX.read(data, opts);
The read
method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array
or ArrayBuffer
).
Read spreadsheet bytes from a local file and extract data
var workbook = XLSX.readFile(filename, opts);
The readFile
method attempts to read a spreadsheet file at the supplied path. Browsers generally do not allow reading files in this way (it is deemed a security risk), and attempts to read files in this way will throw an error.
The second opts
argument is optional. "Parsing Options" covers the supported properties and behaviors.
Examples
Here are a few common scenarios (click on each subtitle to see the code):
Local file in a NodeJS server (click to show)
readFile
uses fs.readFileSync
under the hood:
var XLSX = require("xlsx");
var workbook = XLSX.readFile("test.xlsx");
For Node ESM, the readFile
helper is not enabled. Instead, fs.readFileSync
should be used to read the file data as a Buffer
for use with XLSX.read
:
import { readFileSync } from "fs";
import { read } from "xlsx/xlsx.mjs";
const buf = readFileSync("test.xlsx");
/* buf is a Buffer */
const workbook = read(buf);
Local file in a Deno application (click to show)
readFile
uses Deno.readFileSync
under the hood:
// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
const workbook = XLSX.readFile("test.xlsx");
Applications reading files must be invoked with the --allow-read
flag. The deno
demo has more examples
User-submitted file in a web page ("Drag-and-Drop") (click to show)
For modern websites targeting Chrome 76+, File#arrayBuffer
is recommended:
// XLSX is a global from the standalone script
async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);
For maximal compatibility, the FileReader
API should be used:
function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);
https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
User-submitted file with an HTML INPUT element (click to show)
Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
For modern websites targeting Chrome 76+, Blob#arrayBuffer
is recommended:
// XLSX is a global from the standalone script
async function handleFileAsync(e) {
const file = e.target.files[0];
const data = await file.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener("change", handleFileAsync, false);
For broader support (including IE10+), the FileReader
approach is recommended:
function handleFile(e) {
var file = e.target.files[0];
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(file);
}
input_dom_element.addEventListener("change", handleFile, false);
The oldie
demo shows an IE-compatible fallback scenario.
Fetching a file in the web browser ("Ajax") (click to show)
For modern websites targeting Chrome 42+, fetch
is recommended:
// XLSX is a global from the standalone script
(async() => {
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
})();
For broader support, the XMLHttpRequest
approach is recommended:
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
};
req.send();
The xhr
demo includes a longer discussion and more examples.
http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
Local file in a PhotoShop or InDesign plugin (click to show)
readFile
wraps the File
logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
The extendscript
demo includes a more complex example.
Local file in an Electron app (click to show)
readFile
can be used in the renderer process:
/* From the renderer process */
var XLSX = require("xlsx");
var workbook = XLSX.readFile(path);
Electron APIs have changed over time. The electron
demo shows a complete example and details the required version-specific settings.
Local file in a mobile app with React Native (click to show)
The react
demo includes a sample React Native app.
Since React Native does not provide a way to read files from the filesystem, a third-party library must be used. The following libraries have been tested:
The base64
encoding returns strings compatible with the base64
type:
import XLSX from "xlsx";
import { FileSystem } from "react-native-file-access";
const b64 = await FileSystem.readFile(path, "base64");
/* b64 is a base64 string */
const workbook = XLSX.read(b64, {type: "base64"});
The ascii
encoding returns binary strings compatible with the binary
type:
import XLSX from "xlsx";
import { readFile } from "react-native-fs";
const bstr = await readFile(path, "ascii");
/* bstr is a binary string */
const workbook = XLSX.read(bstr, {type: "binary"});
NodeJS Server File Uploads (click to show)
read
can accept a NodeJS buffer. readFile
can read files generated by a HTTP POST request body parser like formidable
:
const XLSX = require("xlsx");
const http = require("http");
const formidable = require("formidable");
const server = http.createServer((req, res) => {
const form = new formidable.IncomingForm();
form.parse(req, (err, fields, files) => {
/* grab the first file */
const f = Object.entries(files)[0][1];
const path = f.filepath;
const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
});
}).listen(process.env.PORT || 7262);
The server
demo has more advanced examples.
Download files in a NodeJS process (click to show)
Node 17.5 and 18.0 have native support for fetch:
const XLSX = require("xlsx");
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
For broader compatibility, third-party modules are recommended.
request
requires a null
encoding to yield Buffers:
var XLSX = require("xlsx");
var request = require("request");
request({url: url, encoding: null}, function(err, resp, body) {
var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
});
axios
works the same way in browser and in NodeJS:
const XLSX = require("xlsx");
const axios = require("axios");
(async() => {
const res = await axios.get(url, {responseType: "arraybuffer"});
/* res.data is a Buffer */
const workbook = XLSX.read(res.data);
/* DO SOMETHING WITH workbook HERE */
})();
Download files in an Electron app (click to show)
The net
module in the main process can make HTTP/HTTPS requests to external resources. Responses should be manually concatenated using Buffer.concat
:
const XLSX = require("xlsx");
const { net } = require("electron");
const req = net.request(url);
req.on("response", (res) => {
const bufs = []; // this array will collect all of the buffers
res.on("data", (chunk) => { bufs.push(chunk); });
res.on("end", () => {
const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
});
});
req.end();
Readable Streams in NodeJS (click to show)
When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end:
var fs = require("fs");
var XLSX = require("xlsx");
function process_RS(stream, cb) {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
ReadableStream in the browser (click to show)
When dealing with ReadableStream
, the easiest approach is to buffer the stream and process the whole thing at the end:
// XLSX is a global from the standalone script
async function process_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}
/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
let off = 0;
for(const u8 of arr) {
out.set(u8, off);
off += u8.length;
}
return out;
}
const data = await process_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);
More detailed examples are covered in the included demos
JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.
Create a new Workbook
var workbook = XLSX.utils.book_new();
The book_new
utility function creates an empty workbook with no worksheets.
Spreadsheet software generally require at least one worksheet and enforce the requirement in the user interface. This library enforces the requirement at write time, throwing errors if an empty workbook is passed to write functions.
API
Create a worksheet from an array of arrays of JS values
var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
The aoa_to_sheet
utility function walks an "array of arrays" in row-major order, generating a worksheet object. The following snippet generates a sheet with cell A1
set to the string A1
, cell B1
set to B1
, etc:
var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);
"Array of Arrays Input" describes the function and the optional opts
argument in more detail.
Create a worksheet from an array of JS objects
var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
The json_to_sheet
utility function walks an array of JS objects in order, generating a worksheet object. By default, it will generate a header row and one row per object in the array. The optional opts
argument has settings to control the column order and header output.
"Array of Objects Input" describes the function and the optional opts
argument in more detail.
Examples
"Zen of SheetJS" contains a detailed example "Get Data from a JSON Endpoint and Generate a Workbook"
x-spreadsheet
is an interactive data grid for previewing and modifying structured data in the web browser. The xspreadsheet
demo includes a sample script with the xtos
function for converting from x-spreadsheet data object to a workbook. https://oss.sheetjs.com/sheetjs/x-spreadsheet is a live demo.
Records from a database query (SQL or no-SQL) (click to show)
The database
demo includes examples of working with databases and query results.
Numerical Computations with TensorFlow.js (click to show)
@tensorflow/tfjs
and other libraries expect data in simple arrays, well-suited for worksheets where each column is a data vector. That is the transpose of how most people use spreadsheets, where each row is a vector.
When recovering data from tfjs
, the returned data points are stored in a typed array. An array of arrays can be constructed with loops. Array#unshift
can prepend a title row before the conversion:
const XLSX = require("xlsx");
const tf = require('@tensorflow/tfjs');
/* suppose xs and ys are vectors (1D tensors) -> tfarr will be a typed array */
const tfdata = tf.stack([xs, ys]).transpose();
const shape = tfdata.shape;
const tfarr = tfdata.dataSync();
/* construct the array of arrays */
const aoa = [];
for(let j = 0; j < shape[0]; ++j) {
aoa[j] = [];
for(let i = 0; i < shape[1]; ++i) aoa[j][i] = tfarr[j * shape[1] + i];
}
/* add headers to the top */
aoa.unshift(["x", "y"]);
/* generate worksheet */
const worksheet = XLSX.utils.aoa_to_sheet(aoa);
The array
demo shows a complete example.
API
Create a worksheet by scraping an HTML TABLE in the page
var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
The table_to_sheet
utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. The opts
argument is optional. "HTML Table Input" describes the function in more detail.
Create a workbook by scraping an HTML TABLE in the page
var workbook = XLSX.utils.table_to_book(dom_element, opts);
The table_to_book
utility function follows the same logic as table_to_sheet
. After generating a worksheet, it creates a blank workbook and appends the spreadsheet.
The options argument supports the same options as table_to_sheet
, with the addition of a sheet
property to control the worksheet name. If the property is missing or no options are specified, the default name Sheet1
is used.
Examples
Here are a few common scenarios (click on each subtitle to see the code):
HTML TABLE element in a webpage (click to show)
<!-- include the standalone script and shim. this uses the UNPKG CDN -->
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>
<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
/* DO SOMETHING WITH workbook HERE */
</script>
Multiple tables on a web page can be converted to individual worksheets:
/* create new workbook */
var workbook = XLSX.utils.book_new();
/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
/* workbook now has 2 worksheets */
Alternatively, the HTML code can be extracted and parsed:
var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)
The chrome
demo shows a complete example and details the required permissions and other settings.
In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:
/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
Server-Side HTML Tables with Headless Chrome (click to show)
The headless
demo includes a complete demo to convert HTML files to XLSB workbooks. The core idea is to add the script to the page, parse the table in the page context, generate a base64
workbook and send it back for further processing:
const XLSX = require("xlsx");
const { readFileSync } = require("fs"), puppeteer = require("puppeteer");
const url = `https://sheetjs.com/demos/table`;
/* get the standalone build source (node_modules/xlsx/dist/xlsx.full.min.js) */
const lib = readFileSync(require.resolve("xlsx/dist/xlsx.full.min.js"), "utf8");
(async() => {
/* start browser and go to web page */
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, {waitUntil: "networkidle2"});
/* inject library */
await page.addScriptTag({content: lib});
/* this function `s5s` will be called by the script below, receiving the Base64-encoded file */
await page.exposeFunction("s5s", async(b64) => {
const workbook = XLSX.read(b64, {type: "base64" });
/* DO SOMETHING WITH workbook HERE */
});
/* generate XLSB file in webpage context and send back result */
await page.addScriptTag({content: `
/* call table_to_book on first table */
var workbook = XLSX.utils.table_to_book(document.querySelector("TABLE"));
/* generate XLSX file */
var b64 = XLSX.write(workbook, {type: "base64", bookType: "xlsb"});
/* call "s5s" hook exposed from the node process */
window.s5s(b64);
`});
/* cleanup */
await browser.close();
})();
Server-Side HTML Tables with Headless WebKit (click to show)
The headless
demo includes a complete demo to convert HTML files to XLSB workbooks using PhantomJS. The core idea is to add the script to the page, parse the table in the page context, generate a binary
workbook and send it back for further processing:
var XLSX = require('xlsx');
var page = require('webpage').create();
/* this code will be run in the page */
var code = [ "function(){",
/* call table_to_book on first table */
"var wb = XLSX.utils.table_to_book(document.body.getElementsByTagName('table')[0]);",
/* generate XLSB file and return binary string */
"return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
"}" ].join("");
page.open('https://sheetjs.com/demos/table', function() {
/* Load the browser script from the UNPKG CDN */
page.includeJs("https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js", function() {
/* The code will return an XLSB file encoded as binary string */
var bin = page.evaluateJavaScript(code);
var workbook = XLSX.read(bin, {type: "binary"});
/* DO SOMETHING WITH workbook HERE */
phantom.exit();
});
});
NodeJS HTML Tables without a browser (click to show)
NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. jsdom
is a lightweight alternative:
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");
/* obtain HTML string. This example reads from test.html */
const html_str = fs.readFileSync("test.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
The "Common Spreadsheet Format" is a simple object representation of the core concepts of a workbook. The utility functions work with the object representation and are intended to handle common use cases.
API
Append a Worksheet to a Workbook
XLSX.utils.book_append_sheet(workbook, worksheet, sheet_name);
The book_append_sheet
utility function appends a worksheet to the workbook. The third argument specifies the desired worksheet name. Multiple worksheets can be added to a workbook by calling the function multiple times. If the worksheet name is already used in the workbook, it will throw an error.
Append a Worksheet to a Workbook and find a unique name
var new_name = XLSX.utils.book_append_sheet(workbook, worksheet, name, true);
If the fourth argument is true
, the function will start with the specified worksheet name. If the sheet name exists in the workbook, a new worksheet name will be chosen by finding the name stem and incrementing the counter:
XLSX.utils.book_append_sheet(workbook, sheetA, "Sheet2", true); // Sheet2
XLSX.utils.book_append_sheet(workbook, sheetB, "Sheet2", true); // Sheet3
XLSX.utils.book_append_sheet(workbook, sheetC, "Sheet2", true); // Sheet4
XLSX.utils.book_append_sheet(workbook, sheetD, "Sheet2", true); // Sheet5
List the Worksheet names in tab order
var wsnames = workbook.SheetNames;
The SheetNames
property of the workbook object is a list of the worksheet names in "tab order". API functions will look at this array.
Replace a Worksheet in place
workbook.Sheets[sheet_name] = new_worksheet;
The Sheets
property of the workbook object is an object whose keys are names and whose values are worksheet objects. By reassigning to a property of the Sheets
object, the worksheet object can be changed without disrupting the rest of the worksheet structure.
Examples
Add a new worksheet to a workbook (click to show)
This example uses XLSX.utils.aoa_to_sheet
.
var ws_name = "SheetJS";
/* Create worksheet */
var ws_data = [
[ "S", "h", "e", "e", "t", "J", "S" ],
[ 1 , 2 , 3 , 4 , 5 ]
];
var ws = XLSX.utils.aoa_to_sheet(ws_data);
/* Add the worksheet to the workbook */
XLSX.utils.book_append_sheet(wb, ws, ws_name);
API
Modify a single cell value in a worksheet
XLSX.utils.sheet_add_aoa(worksheet, [[new_value]], { origin: address });
Modify multiple cell values in a worksheet
XLSX.utils.sheet_add_aoa(worksheet, aoa, opts);
The sheet_add_aoa
utility function modifies cell values in a worksheet. The first argument is the worksheet object. The second argument is an array of arrays of values. The origin
key of the third argument controls where cells will be written. The following snippet sets B3=1
and E5="abc"
:
XLSX.utils.sheet_add_aoa(worksheet, [
[1], // <-- Write 1 to cell B3
, // <-- Do nothing in row 4
[/*B5*/, /*C5*/, /*D5*/, "abc"] // <-- Write "abc" to cell E5
], { origin: "B3" });
"Array of Arrays Input" describes the function and the optional opts
argument in more detail.
Examples
Appending rows to a worksheet (click to show)
The special origin value -1
instructs sheet_add_aoa
to start in column A of the row after the last row in the range, appending the data:
XLSX.utils.sheet_add_aoa(worksheet, [
["first row after data", 1],
["second row after data", 2]
], { origin: -1 });
The "Common Spreadsheet Format" section describes the object structures in greater detail.
API
Generate spreadsheet bytes (file) from data
var data = XLSX.write(workbook, opts);
The write
method attempts to package data from the workbook into a file in memory. By default, XLSX files are generated, but that can be controlled with the bookType
property of the opts
argument. Based on the type
option, the data can be stored as a "binary string", JS string, Uint8Array
or Buffer.
The second opts
argument is required. "Writing Options" covers the supported properties and behaviors.
Generate and attempt to save file
XLSX.writeFile(workbook, filename, opts);
The writeFile
method packages the data and attempts to save the new file. The export file format is determined by the extension of filename
(SheetJS.xlsx
signals XLSX export, SheetJS.xlsb
signals XLSB export, etc).
The writeFile
method uses platform-specific APIs to initiate the file save. In NodeJS, fs.readFileSync
can create a file. In the web browser, a download is attempted using the HTML5 download
attribute, with fallbacks for IE.
Generate and attempt to save an XLSX file
XLSX.writeFileXLSX(workbook, filename, opts);
The writeFile
method embeds a number of different export functions. This is great for developer experience but not amenable to tree shaking using the current developer tools. When only XLSX exports are needed, this method avoids referencing the other export functions.
The second opts
argument is optional. "Writing Options" covers the supported properties and behaviors.
Examples
Local file in a NodeJS server (click to show)
writeFile
uses fs.writeFileSync
in server environments:
var XLSX = require("xlsx");
/* output format determined by filename */
XLSX.writeFile(workbook, "out.xlsb");
For Node ESM, the writeFile
helper is not enabled. Instead, fs.writeFileSync
should be used to write the file data to a Buffer
for use with XLSX.write
:
import { writeFileSync } from "fs";
import { write } from "xlsx/xlsx.mjs";
const buf = write(workbook, {type: "buffer", bookType: "xlsb"});
/* buf is a Buffer */
const workbook = writeFileSync("out.xlsb", buf);
Local file in a Deno application (click to show)
writeFile
uses Deno.writeFileSync
under the hood:
// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
XLSX.writeFile(workbook, "test.xlsx");
Applications writing files must be invoked with the --allow-write
flag. The deno
demo has more examples
Local file in a PhotoShop or InDesign plugin (click to show)
writeFile
wraps the File
logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* output format determined by filename */
XLSX.writeFile(workbook, "out.xlsx");
/* at this point, out.xlsx is a file that you can distribute */
The extendscript
demo includes a more complex example.
Download a file in the browser to the user machine (click to show)
XLSX.writeFile
wraps a few techniques for triggering a file save:
URL
browser API creates an object URL for the file, which the library uses by creating a link and forcing a click. It is supported in modern browsers.msSaveBlob
is an IE10+ API for triggering a file save.IE_FileSave
uses VBScript and ActiveX to write a file in IE6+ for Windows XP and Windows 7. The shim must be included in the containing HTML page.There is no standard way to determine if the actual file has been downloaded.
/* output format determined by filename */
XLSX.writeFile(workbook, "out.xlsb");
/* at this point, out.xlsb will have been downloaded */
Download a file in legacy browsers (click to show)
XLSX.writeFile
techniques work for most modern browsers as well as older IE. For much older browsers, there are workarounds implemented by wrapper libraries.
FileSaver.js
implements saveAs
. Note: XLSX.writeFile
will automatically call saveAs
if available.
/* bookType can be any supported output type */
var wopts = { bookType:"xlsx", bookSST:false, type:"array" };
var wbout = XLSX.write(workbook,wopts);
/* the saveAs call downloads a file on the local machine */
saveAs(new Blob([wbout],{type:"application/octet-stream"}), "test.xlsx");
Downloadify
uses a Flash SWF button to generate local files, suitable for environments where ActiveX is unavailable:
Downloadify.create(id,{
/* other options are required! read the downloadify docs for more info */
filename: "test.xlsx",
data: function() { return XLSX.write(wb, {bookType:"xlsx", type:"base64"}); },
append: false,
dataType: "base64"
});
The oldie
demo shows an IE-compatible fallback scenario.
Browser upload file (ajax) (click to show)
A complete example using XHR is included in the XHR demo, along with examples for fetch and wrapper libraries. This example assumes the server can handle Base64-encoded files (see the demo for a basic nodejs server):
/* in this example, send a base64 string to the server */
var wopts = { bookType:"xlsx", bookSST:false, type:"base64" };
var wbout = XLSX.write(workbook,wopts);
var req = new XMLHttpRequest();
req.open("POST", "/upload", true);
var formdata = new FormData();
formdata.append("file", "test.xlsx"); // <-- server expects `file` to hold name
formdata.append("data", wbout); // <-- `data` holds the base64-encoded data
req.send(formdata);
PhantomJS (Headless Webkit) File Generation (click to show)
The headless
demo includes a complete demo to convert HTML files to XLSB workbooks using PhantomJS. PhantomJS fs.write
supports writing files from the main process but has a different interface from the NodeJS fs
module:
var XLSX = require('xlsx');
var fs = require('fs');
/* generate a binary string */
var bin = XLSX.write(workbook, { type:"binary", bookType: "xlsx" });
/* write to file */
fs.write("test.xlsx", bin, "wb");
Note: The section "Processing HTML Tables" shows how to generate a workbook from HTML tables in a page in "Headless WebKit".
The included demos cover mobile apps and other special deployments.
The streaming write functions are available in the XLSX.stream
object. They take the same arguments as the normal write functions but return a NodeJS Readable Stream.
XLSX.stream.to_csv
is the streaming version of XLSX.utils.sheet_to_csv
.XLSX.stream.to_html
is the streaming version of XLSX.utils.sheet_to_html
.XLSX.stream.to_json
is the streaming version of XLSX.utils.sheet_to_json
.nodejs convert to CSV and write file (click to show)
var output_file_name = "out.csv";
var stream = XLSX.stream.to_csv(worksheet);
stream.pipe(fs.createWriteStream(output_file_name));
nodejs write JSON stream to screen (click to show)
/* to_json returns an object-mode stream */
var stream = XLSX.stream.to_json(worksheet, {raw:true});
/* the following stream converts JS objects to text via JSON.stringify */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };
stream.pipe(conv); conv.pipe(process.stdout);
Exporting NUMBERS files (click to show)
The NUMBERS writer requires a fairly large base. The supplementary xlsx.zahl
scripts provide support. xlsx.zahl.js
is designed for standalone and NodeJS use, while xlsx.zahl.mjs
is suitable for ESM.
Browser
<meta charset="utf8">
<script src="xlsx.full.min.js"></script>
<script src="xlsx.zahl.js"></script>
<script>
var wb = XLSX.utils.book_new(); var ws = XLSX.utils.aoa_to_sheet([
["SheetJS", "<3","விரிதாள்"],
[72,,"Arbeitsblätter"],
[,62,"数据"],
[true,false,],
]); XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
XLSX.writeFile(wb, "textport.numbers", {numbers: XLSX_ZAHL, compression: true});
</script>
Node
var XLSX = require("./xlsx.flow");
var XLSX_ZAHL = require("./dist/xlsx.zahl");
var wb = XLSX.utils.book_new(); var ws = XLSX.utils.aoa_to_sheet([
["SheetJS", "<3","விரிதாள்"],
[72,,"Arbeitsblätter"],
[,62,"数据"],
[true,false,],
]); XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
XLSX.writeFile(wb, "textport.numbers", {numbers: XLSX_ZAHL, compression: true});
Deno
import * as XLSX from './xlsx.mjs';
import XLSX_ZAHL from './dist/xlsx.zahl.mjs';
var wb = XLSX.utils.book_new(); var ws = XLSX.utils.aoa_to_sheet([
["SheetJS", "<3","விரிதாள்"],
[72,,"Arbeitsblätter"],
[,62,"数据"],
[true,false,],
]); XLSX.utils.book_append_sheet(wb, ws, "Sheet1");
XLSX.writeFile(wb, "textports.numbers", {numbers: XLSX_ZAHL, compression: true});
https://github.com/sheetjs/sheetaki pipes write streams to nodejs response.
JSON and JS data tend to represent single worksheets. The utility functions in this section work with single worksheets.
The "Common Spreadsheet Format" section describes the object structure in more detail. workbook.SheetNames
is an ordered list of the worksheet names. workbook.Sheets
is an object whose keys are sheet names and whose values are worksheet objects.
The "first worksheet" is stored at workbook.Sheets[workbook.SheetNames[0]]
.
API
Create an array of JS objects from a worksheet
var jsa = XLSX.utils.sheet_to_json(worksheet, opts);
Create an array of arrays of JS values from a worksheet
var aoa = XLSX.utils.sheet_to_json(worksheet, {...opts, header: 1});
The sheet_to_json
utility function walks a workbook in row-major order, generating an array of objects. The second opts
argument controls a number of export decisions including the type of values (JS values or formatted text). The "JSON" section describes the argument in more detail.
By default, sheet_to_json
scans the first row and uses the values as headers. With the header: 1
option, the function exports an array of arrays of values.
Examples
x-spreadsheet
is an interactive data grid for previewing and modifying structured data in the web browser. The xspreadsheet
demo includes a sample script with the stox
function for converting from a workbook to x-spreadsheet data object. https://oss.sheetjs.com/sheetjs/x-spreadsheet is a live demo.
Previewing data in a React data grid (click to show)
react-data-grid
is a data grid tailored for react. It expects two properties: rows
of data objects and columns
which describe the columns. For the purposes of massaging the data to fit the react data grid API it is easiest to start from an array of arrays.
This demo starts by fetching a remote file and using XLSX.read
to extract:
import { useEffect, useState } from "react";
import DataGrid from "react-data-grid";
import { read, utils } from "xlsx";
const url = "https://oss.sheetjs.com/test_files/RkNumber.xls";
export default function App() {
const [columns, setColumns] = useState([]);
const [rows, setRows] = useState([]);
useEffect(() => {(async () => {
const wb = read(await (await fetch(url)).arrayBuffer(), { WTF: 1 });
/* use sheet_to_json with header: 1 to generate an array of arrays */
const data = utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]], { header: 1 });
/* see react-data-grid docs to understand the shape of the expected data */
setColumns(data[0].map((r) => ({ key: r, name: r })));
setRows(data.slice(1).map((r) => r.reduce((acc, x, i) => {
acc[data[0][i]] = x;
return acc;
}, {})));
})(); });
return <DataGrid columns={columns} rows={rows} />;
}
Previewing data in a VueJS data grid (click to show)
vue3-table-lite
is a simple VueJS 3 data table. It is featured in the VueJS demo.
Populating a database (SQL or no-SQL) (click to show)
The database
demo includes examples of working with databases and query results.
Numerical Computations with TensorFlow.js (click to show)
@tensorflow/tfjs
and other libraries expect data in simple arrays, well-suited for worksheets where each column is a data vector. That is the transpose of how most people use spreadsheets, where each row is a vector.
A single Array#map
can pull individual named rows from sheet_to_json
export:
const XLSX = require("xlsx");
const tf = require('@tensorflow/tfjs');
const key = "age"; // this is the field we want to pull
const ages = XLSX.utils.sheet_to_json(worksheet).map(r => r[key]);
const tf_data = tf.tensor1d(ages);
All fields can be processed at once using a transpose of the 2D tensor generated with the sheet_to_json
export with header: 1
. The first row, if it contains header labels, should be removed with a slice:
const XLSX = require("xlsx");
const tf = require('@tensorflow/tfjs');
/* array of arrays of the data starting on the second row */
const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1}).slice(1);
/* dataset in the "correct orientation" */
const tf_dataset = tf.tensor2d(aoa).transpose();
/* pull out each dataset with a slice */
const tf_field0 = tf_dataset.slice([0,0], [1,tensor.shape[1]]).flatten();
const tf_field1 = tf_dataset.slice([1,0], [1,tensor.shape[1]]).flatten();
The array
demo shows a complete example.
API
Generate HTML Table from Worksheet
var html = XLSX.utils.sheet_to_html(worksheet);
The sheet_to_html
utility function generates HTML code based on the worksheet data. Each cell in the worksheet is mapped to a <TD>
element. Merged cells in the worksheet are serialized by setting colspan
and rowspan
attributes.
Examples
The sheet_to_html
utility function generates HTML code that can be added to any DOM element by setting the innerHTML
:
var container = document.getElementById("tavolo");
container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
Combining with fetch
, constructing a site from a workbook is straightforward:
Vanilla JS + HTML fetch workbook and generate table previews (click to show)
<body>
<style>TABLE { border-collapse: collapse; } TD { border: 1px solid; }</style>
<div id="tavolo"></div>
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
<script type="text/javascript">
(async() => {
/* fetch and parse workbook -- see the fetch example for details */
const workbook = XLSX.read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
let output = [];
/* loop through the worksheet names in order */
workbook.SheetNames.forEach(name => {
/* generate HTML from the corresponding worksheets */
const worksheet = workbook.Sheets[name];
const html = XLSX.utils.sheet_to_html(worksheet);
/* add a header with the title name followed by the table */
output.push(`<H3>${name}</H3>${html}`);
});
/* write to the DOM at the end */
tavolo.innerHTML = output.join("\n");
})();
</script>
</body>
React fetch workbook and generate HTML table previews (click to show)
It is generally recommended to use a React-friendly workflow, but it is possible to generate HTML and use it in React with dangerouslySetInnerHTML
:
function Tabeller(props) {
/* the workbook object is the state */
const [workbook, setWorkbook] = React.useState(XLSX.utils.book_new());
/* fetch and update the workbook with an effect */
React.useEffect(() => { (async() => {
/* fetch and parse workbook -- see the fetch example for details */
const wb = XLSX.read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
setWorkbook(wb);
})(); });
return workbook.SheetNames.map(name => (<>
<h3>name</h3>
<div dangerouslySetInnerHTML={{
/* this __html mantra is needed to set the inner HTML */
__html: XLSX.utils.sheet_to_html(workbook.Sheets[name])
}} />
</>));
}
The react
demo includes more React examples.
VueJS fetch workbook and generate HTML table previews (click to show)
It is generally recommended to use a VueJS-friendly workflow, but it is possible to generate HTML and use it in VueJS with the v-html
directive:
import { read, utils } from 'xlsx';
import { reactive } from 'vue';
const S5SComponent = {
mounted() { (async() => {
/* fetch and parse workbook -- see the fetch example for details */
const workbook = read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
/* loop through the worksheet names in order */
workbook.SheetNames.forEach(name => {
/* generate HTML from the corresponding worksheets */
const html = utils.sheet_to_html(workbook.Sheets[name]);
/* add to state */
this.wb.wb.push({ name, html });
});
})(); },
/* this state mantra is required for array updates to work */
setup() { return { wb: reactive({ wb: [] }) }; },
template: `
<div v-for="ws in wb.wb" :key="ws.name">
<h3>{{ ws.name }}</h3>
<div v-html="ws.html"></div>
</div>`
};
The vuejs
demo includes more React examples.
The sheet_to_*
functions accept a worksheet object.
API
Generate a CSV from a single worksheet
var csv = XLSX.utils.sheet_to_csv(worksheet, opts);
This snapshot is designed to replicate the "CSV UTF8 (.csv
)" output type. "Delimiter-Separated Output" describes the function and the optional opts
argument in more detail.
Generate "Text" from a single worksheet
var txt = XLSX.utils.sheet_to_txt(worksheet, opts);
This snapshot is designed to replicate the "UTF16 Text (.txt
)" output type. "Delimiter-Separated Output" describes the function and the optional opts
argument in more detail.
Generate a list of formulae from a single worksheet
var fmla = XLSX.utils.sheet_to_formulae(worksheet);
This snapshot generates an array of entries representing the embedded formulae. Array formulae are rendered in the form range=formula
while plain cells are rendered in the form cell=formula or value
. String literals are prefixed with an apostrophe '
, consistent with Excel's formula bar display.
"Formulae Output" describes the function in more detail.
XLSX
is the exposed variable in the browser and the exported node variable
XLSX.version
is the version of the library (added by the build script).
XLSX.SSF
is an embedded version of the format library.
XLSX.read(data, read_opts)
attempts to parse data
.
XLSX.readFile(filename, read_opts)
attempts to read filename
and parse.
Parse options are described in the Parsing Options section.
XLSX.write(wb, write_opts)
attempts to write the workbook wb
XLSX.writeFile(wb, filename, write_opts)
attempts to write wb
to filename
. In browser-based environments, it will attempt to force a client-side download.
XLSX.writeFileAsync(filename, wb, o, cb)
attempts to write wb
to filename
. If o
is omitted, the writer will use the third argument as the callback.
XLSX.stream
contains a set of streaming write functions.
Write options are described in the Writing Options section.
Utilities are available in the XLSX.utils
object and are described in the Utility Functions section:
Constructing:
book_new
creates an empty workbookbook_append_sheet
adds a worksheet to a workbookImporting:
aoa_to_sheet
converts an array of arrays of JS data to a worksheet.json_to_sheet
converts an array of JS objects to a worksheet.table_to_sheet
converts a DOM TABLE element to a worksheet.sheet_add_aoa
adds an array of arrays of JS data to an existing worksheet.sheet_add_json
adds an array of JS objects to an existing worksheet.Exporting:
sheet_to_json
converts a worksheet object to an array of JSON objects.sheet_to_csv
generates delimiter-separated-values output.sheet_to_txt
generates UTF16 formatted text.sheet_to_html
generates HTML output.sheet_to_formulae
generates a list of the formulae (with value fallbacks).Cell and cell address manipulation:
format_cell
generates the text value for a cell (using number formats).encode_row / decode_row
converts between 0-indexed rows and 1-indexed rows.encode_col / decode_col
converts between 0-indexed columns and column names.encode_cell / decode_cell
converts cell addresses.encode_range / decode_range
converts cell ranges.SheetJS conforms to the Common Spreadsheet Format (CSF):
Cell address objects are stored as {c:C, r:R}
where C
and R
are 0-indexed column and row numbers, respectively. For example, the cell address B5
is represented by the object {c:1, r:4}
.
Cell range objects are stored as {s:S, e:E}
where S
is the first cell and E
is the last cell in the range. The ranges are inclusive. For example, the range A3:B7
is represented by the object {s:{c:0, r:2}, e:{c:1, r:6}}
. Utility functions perform a row-major order walk traversal of a sheet range:
for(var R = range.s.r; R <= range.e.r; ++R) {
for(var C = range.s.c; C <= range.e.c; ++C) {
var cell_address = {c:C, r:R};
/* if an A1-style address is needed, encode the address */
var cell_ref = XLSX.utils.encode_cell(cell_address);
}
}
Cell objects are plain JS objects with keys and values following the convention:
Key | Description |
---|---|
v | raw value (see Data Types section for more info) |
w | formatted text (if applicable) |
t | type: b Boolean, e Error, n Number, d Date, s Text, z Stub |
f | cell formula encoded as an A1-style string (if applicable) |
F | range of enclosing array if formula is array formula (if applicable) |
D | if true, array formula is dynamic (if applicable) |
r | rich text encoding (if applicable) |
h | HTML rendering of the rich text (if applicable) |
c | comments associated with the cell |
z | number format string associated with the cell (if requested) |
l | cell hyperlink object (.Target holds link, .Tooltip is tooltip) |
s | the style/theme of the cell (if applicable) |
Built-in export utilities (such as the CSV exporter) will use the w
text if it is available. To change a value, be sure to delete cell.w
(or set it to undefined
) before attempting to export. The utilities will regenerate the w
text from the number format (cell.z
) and the raw value if possible.
The actual array formula is stored in the f
field of the first cell in the array range. Other cells in the range will omit the f
field.
The raw value is stored in the v
value property, interpreted based on the t
type property. This separation allows for representation of numbers as well as numeric text. There are 6 valid cell types:
Type | Description |
---|---|
b | Boolean: value interpreted as JS boolean |
e | Error: value is a numeric code and w property stores common name ** |
n | Number: value is a JS number ** |
d | Date: value is a JS Date object or string to be parsed as Date ** |
s | Text: value interpreted as JS string and written as text ** |
z | Stub: blank stub cell that is ignored by data processing utilities ** |
Error values and interpretation (click to show)
Value | Error Meaning |
---|---|
0x00 | #NULL! |
0x07 | #DIV/0! |
0x0F | #VALUE! |
0x17 | #REF! |
0x1D | #NAME? |
0x24 | #NUM! |
0x2A | #N/A |
0x2B | #GETTING_DATA |
Type n
is the Number type. This includes all forms of data that Excel stores as numbers, such as dates/times and Boolean fields. Excel exclusively uses data that can be fit in an IEEE754 floating point number, just like JS Number, so the v
field holds the raw number. The w
field holds formatted text. Dates are stored as numbers by default and converted with XLSX.SSF.parse_date_code
.
Type d
is the Date type, generated only when the option cellDates
is passed. Since JSON does not have a natural Date type, parsers are generally expected to store ISO 8601 Date strings like you would get from date.toISOString()
. On the other hand, writers and exporters should be able to handle date strings and JS Date objects. Note that Excel disregards timezone modifiers and treats all dates in the local timezone. The library does not correct for this error.
Type s
is the String type. Values are explicitly stored as text. Excel will interpret these cells as "number stored as text". Generated Excel files automatically suppress that class of error, but other formats may elicit errors.
Type z
represents blank stub cells. They are generated in cases where cells have no assigned value but hold comments or other metadata. They are ignored by the core library data processing utility functions. By default these cells are not generated; the parser sheetStubs
option must be set to true
.
Excel Date Code details (click to show)
By default, Excel stores dates as numbers with a format code that specifies date processing. For example, the date 19-Feb-17
is stored as the number 42785
with a number format of d-mmm-yy
. The SSF
module understands number formats and performs the appropriate conversion.
XLSX also supports a special date type d
where the data is an ISO 8601 date string. The formatter converts the date back to a number.
The default behavior for all parsers is to generate number cells. Setting cellDates
to true will force the generators to store dates.
Time Zones and Dates (click to show)
Excel has no native concept of universal time. All times are specified in the local time zone. Excel limitations prevent specifying true absolute dates.
Following Excel, this library treats all dates as relative to local time zone.
Epochs: 1900 and 1904 (click to show)
Excel supports two epochs (January 1 1900 and January 1 1904). The workbook's epoch can be determined by examining the workbook's wb.Workbook.WBProps.date1904
property:
!!(((wb.Workbook||{}).WBProps||{}).date1904)
Each key that does not start with !
maps to a cell (using A-1
notation)
sheet[address]
returns the cell object for the specified address.
Special sheet keys (accessible as sheet[key]
, each starting with !
):
sheet['!ref']
: A-1 based range representing the sheet range. Functions that work with sheets should use this parameter to determine the range. Cells that are assigned outside of the range are not processed. In particular, when writing a sheet by hand, cells outside of the range are not included
Functions that handle sheets should test for the presence of !ref
field. If the !ref
is omitted or is not a valid range, functions are free to treat the sheet as empty or attempt to guess the range. The standard utilities that ship with this library treat sheets as empty (for example, the CSV output is empty string).
When reading a worksheet with the sheetRows
property set, the ref parameter will use the restricted range. The original range is set at ws['!fullref']
sheet['!margins']
: Object representing the page margins. The default values follow Excel's "normal" preset. Excel also has a "wide" and a "narrow" preset but they are stored as raw measurements. The main properties are listed below:
Page margin details (click to show)
key | description | "normal" | "wide" | "narrow" |
---|---|---|---|---|
left | left margin (inches) | 0.7 | 1.0 | 0.25 |
right | right margin (inches) | 0.7 | 1.0 | 0.25 |
top | top margin (inches) | 0.75 | 1.0 | 0.75 |
bottom | bottom margin (inches) | 0.75 | 1.0 | 0.75 |
header | header margin (inches) | 0.3 | 0.5 | 0.3 |
footer | footer margin (inches) | 0.3 | 0.5 | 0.3 |
/* Set worksheet sheet to "normal" */
ws["!margins"]={left:0.7, right:0.7, top:0.75,bottom:0.75,header:0.3,footer:0.3}
/* Set worksheet sheet to "wide" */
ws["!margins"]={left:1.0, right:1.0, top:1.0, bottom:1.0, header:0.5,footer:0.5}
/* Set worksheet sheet to "narrow" */
ws["!margins"]={left:0.25,right:0.25,top:0.75,bottom:0.75,header:0.3,footer:0.3}
In addition to the base sheet keys, worksheets also add:
ws['!cols']
: array of column properties objects. Column widths are actually stored in files in a normalized manner, measured in terms of the "Maximum Digit Width" (the largest width of the rendered digits 0-9, in pixels). When parsed, the column objects store the pixel width in the wpx
field, character width in the wch
field, and the maximum digit width in the MDW
field.
ws['!rows']
: array of row properties objects as explained later in the docs. Each row object encodes properties including row height and visibility.
ws['!merges']
: array of range objects corresponding to the merged cells in the worksheet. Plain text formats do not support merge cells. CSV export will write all cells in the merge range if they exist, so be sure that only the first cell (upper-left) in the range is set.
ws['!outline']
: configure how outlines should behave. Options default to the default settings in Excel 2019:
key | Excel feature | default |
---|---|---|
above | Uncheck "Summary rows below detail" | false |
left | Uncheck "Summary rows to the right of detail" | false |
ws['!protect']
: object of write sheet protection properties. The password
key specifies the password for formats that support password-protected sheets (XLSX/XLSB/XLS). The writer uses the XOR obfuscation method. The following keys control the sheet protection -- set to false
to enable a feature when sheet is locked or set to true
to disable a feature:Worksheet Protection Details (click to show)
key | feature (true=disabled / false=enabled) | default |
---|---|---|
selectLockedCells | Select locked cells | enabled |
selectUnlockedCells | Select unlocked cells | enabled |
formatCells | Format cells | disabled |
formatColumns | Format columns | disabled |
formatRows | Format rows | disabled |
insertColumns | Insert columns | disabled |
insertRows | Insert rows | disabled |
insertHyperlinks | Insert hyperlinks | disabled |
deleteColumns | Delete columns | disabled |
deleteRows | Delete rows | disabled |
sort | Sort | disabled |
autoFilter | Filter | disabled |
pivotTables | Use PivotTable reports | disabled |
objects | Edit objects | enabled |
scenarios | Edit scenarios | enabled |
ws['!autofilter']
: AutoFilter object following the schema:type AutoFilter = {
ref:string; // A-1 based range representing the AutoFilter table range
}
Chartsheets are represented as standard sheets. They are distinguished with the !type
property set to "chart"
.
The underlying data and !ref
refer to the cached data in the chartsheet. The first row of the chartsheet is the underlying header.
Macrosheets are represented as standard sheets. They are distinguished with the !type
property set to "macro"
.
Dialogsheets are represented as standard sheets. They are distinguished with the !type
property set to "dialog"
.
workbook.SheetNames
is an ordered list of the sheets in the workbook
wb.Sheets[sheetname]
returns an object representing the worksheet.
wb.Props
is an object storing the standard properties. wb.Custprops
stores custom properties. Since the XLS standard properties deviate from the XLSX standard, XLS parsing stores core properties in both places.
wb.Workbook
stores workbook-level attributes.
The various file formats use different internal names for file properties. The workbook Props
object normalizes the names:
File Properties (click to show)
JS Name | Excel Description |
---|---|
Title | Summary tab "Title" |
Subject | Summary tab "Subject" |
Author | Summary tab "Author" |
Manager | Summary tab "Manager" |
Company | Summary tab "Company" |
Category | Summary tab "Category" |
Keywords | Summary tab "Keywords" |
Comments | Summary tab "Comments" |
LastAuthor | Statistics tab "Last saved by" |
CreatedDate | Statistics tab "Created" |
For example, to set the workbook title property:
if(!wb.Props) wb.Props = {};
wb.Props.Title = "Insert Title Here";
Custom properties are added in the workbook Custprops
object:
if(!wb.Custprops) wb.Custprops = {};
wb.Custprops["Custom Property"] = "Custom Value";
Writers will process the Props
key of the options object:
/* force the Author to be "SheetJS" */
XLSX.write(wb, {Props:{Author:"SheetJS"}});
wb.Workbook
stores workbook-level attributes.
wb.Workbook.Names
is an array of defined name objects which have the keys:
Defined Name Properties (click to show)
Key | Description |
---|---|
Sheet | Name scope. Sheet Index (0 = first sheet) or null (Workbook) |
Name | Case-sensitive name. Standard rules apply ** |
Ref | A1-style Reference ("Sheet1!$A$1:$D$20" ) |
Comment | Comment (only applicable for XLS/XLSX/XLSB) |
Excel allows two sheet-scoped defined names to share the same name. However, a sheet-scoped name cannot collide with a workbook-scope name. Workbook writers may not enforce this constraint.
wb.Workbook.Views
is an array of workbook view objects which have the keys:
Key | Description |
---|---|
RTL | If true, display right-to-left |
wb.Workbook.WBProps
holds other workbook properties:
Key | Description |
---|---|
CodeName | VBA Project Workbook Code Name |
date1904 | epoch: 0/false for 1900 system, 1/true for 1904 |
filterPrivacy | Warn or strip personally identifying info on save |
Even for basic features like date storage, the official Excel formats store the same content in different ways. The parsers are expected to convert from the underlying file format representation to the Common Spreadsheet Format. Writers are expected to convert from CSF back to the underlying file format.
The A1-style formula string is stored in the f
field. Even though different file formats store the formulae in different ways, the formats are translated. Even though some formats store formulae with a leading equal sign, CSF formulae do not start with =
.
Formulae File Format Support (click to show)
Storage Representation | Formats | Read | Write |
---|---|---|---|
A1-style strings | XLSX | ✔ | ✔ |
RC-style strings | XLML and plain text | ✔ | ✔ |
BIFF Parsed formulae | XLSB and all XLS formats | ✔ | |
OpenFormula formulae | ODS/FODS/UOS | ✔ | ✔ |
Lotus Parsed formulae | All Lotus WK_ formats | ✔ |
Since Excel prohibits named cells from colliding with names of A1 or RC style cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed formulae and Lotus Parsed formulae have to be explicitly unwound. OpenFormula formulae can be converted with regular expressions.
Shared formulae are decompressed and each cell has the formula corresponding to its cell. Writers generally do not attempt to generate shared formulae.
Single-Cell Formulae
For simple formulae, the f
key of the desired cell can be set to the actual formula text. This worksheet represents A1=1
, A2=2
, and A3=A1+A2
:
var worksheet = {
"!ref": "A1:A3",
A1: { t:'n', v:1 },
A2: { t:'n', v:2 },
A3: { t:'n', v:3, f:'A1+A2' }
};
Utilities like aoa_to_sheet
will accept cell objects in lieu of values:
var worksheet = XLSX.utils.aoa_to_sheet([
[ 1 ], // A1
[ 2 ], // A2
[ {t: "n", v: 3, f: "A1+A2"} ] // A3
]);
Cells with formula entries but no value will be serialized in a way that Excel and other spreadsheet tools will recognize. This library will not automatically compute formula results! For example, the following worksheet will include the BESSELJ
function but the result will not be available in JavaScript:
var worksheet = XLSX.utils.aoa_to_sheet([
[ 3.14159, 2 ], // Row "1"
[ { t:'n', f:'BESSELJ(A1,B1)' } ] // Row "2" will be calculated on file open
}
If the actual results are needed in JS, SheetJS Pro offers a formula calculator component for evaluating expressions, updating values and dependent cells, and refreshing entire workbooks.
Array Formulae
Assign an array formula
XLSX.utils.sheet_set_array_formula(worksheet, range, formula);
Array formulae are stored in the top-left cell of the array block. All cells of an array formula have a F
field corresponding to the range. A single-cell formula can be distinguished from a plain formula by the presence of F
field.
For example, setting the cell C1
to the array formula {=SUM(A1:A3*B1:B3)}
:
// API function
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "SUM(A1:A3*B1:B3)");
// ... OR raw operations
worksheet['C1'] = { t:'n', f: "SUM(A1:A3*B1:B3)", F:"C1:C1" };
For a multi-cell array formula, every cell has the same array range but only the first cell specifies the formula. Consider D1:D3=A1:A3*B1:B3
:
// API function
XLSX.utils.sheet_set_array_formula(worksheet, "D1:D3", "A1:A3*B1:B3");
// ... OR raw operations
worksheet['D1'] = { t:'n', F:"D1:D3", f:"A1:A3*B1:B3" };
worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
Utilities and writers are expected to check for the presence of a F
field and ignore any possible formula element f
in cells other than the starting cell. They are not expected to perform validation of the formulae!
Dynamic Array Formulae
Assign a dynamic array formula
XLSX.utils.sheet_set_array_formula(worksheet, range, formula, true);
Released in 2020, Dynamic Array Formulae are supported in the XLSX/XLSM and XLSB file formats. They are represented like normal array formulae but have special cell metadata indicating that the formula should be allowed to adjust the range.
An array formula can be marked as dynamic by setting the cell's D
property to true. The F
range is expected but can be the set to the current cell:
// API function
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "_xlfn.UNIQUE(A1:A3)", 1);
// ... OR raw operations
worksheet['C1'] = { t: "s", f: "_xlfn.UNIQUE(A1:A3)", F:"C1", D: 1 }; // dynamic
Localization with Function Names
SheetJS operates at the file level. Excel stores formula expressions using the English (United States) function names. For non-English users, Excel uses a localized set of function names.
For example, when the computer language and region is set to French (France), Excel interprets =SOMME(A1:C3)
as if SOMME
is the SUM
function. However, in the actual file, Excel stores SUM(A1:C3)
.
Prefixed "Future Functions"
Functions introduced in newer versions of Excel are prefixed with _xlfn.
when stored in files. When writing formula expressions using these functions, the prefix is required for maximal compatibility:
// Broadest compatibility
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "_xlfn.UNIQUE(A1:A3)", 1);
// Can cause errors in spreadsheet software
XLSX.utils.sheet_set_array_formula(worksheet, "C1", "UNIQUE(A1:A3)", 1);
When reading a file, the xlfn
option preserves the prefixes.
Functions requiring `_xlfn.` prefix (click to show)
This list is growing with each Excel release.
ACOT
ACOTH
AGGREGATE
ARABIC
BASE
BETA.DIST
BETA.INV
BINOM.DIST
BINOM.DIST.RANGE
BINOM.INV
BITAND
BITLSHIFT
BITOR
BITRSHIFT
BITXOR
BYCOL
BYROW
CEILING.MATH
CEILING.PRECISE
CHISQ.DIST
CHISQ.DIST.RT
CHISQ.INV
CHISQ.INV.RT
CHISQ.TEST
COMBINA
CONFIDENCE.NORM
CONFIDENCE.T
COT
COTH
COVARIANCE.P
COVARIANCE.S
CSC
CSCH
DAYS
DECIMAL
ERF.PRECISE
ERFC.PRECISE
EXPON.DIST
F.DIST
F.DIST.RT
F.INV
F.INV.RT
F.TEST
FIELDVALUE
FILTERXML
FLOOR.MATH
FLOOR.PRECISE
FORMULATEXT
GAMMA
GAMMA.DIST
GAMMA.INV
GAMMALN.PRECISE
GAUSS
HYPGEOM.DIST
IFNA
IMCOSH
IMCOT
IMCSC
IMCSCH
IMSEC
IMSECH
IMSINH
IMTAN
ISFORMULA
ISOMITTED
ISOWEEKNUM
LAMBDA
LET
LOGNORM.DIST
LOGNORM.INV
MAKEARRAY
MAP
MODE.MULT
MODE.SNGL
MUNIT
NEGBINOM.DIST
NORM.DIST
NORM.INV
NORM.S.DIST
NORM.S.INV
NUMBERVALUE
PDURATION
PERCENTILE.EXC
PERCENTILE.INC
PERCENTRANK.EXC
PERCENTRANK.INC
PERMUTATIONA
PHI
POISSON.DIST
QUARTILE.EXC
QUARTILE.INC
QUERYSTRING
RANDARRAY
RANK.AVG
RANK.EQ
REDUCE
RRI
SCAN
SEC
SECH
SEQUENCE
SHEET
SHEETS
SKEW.P
SORTBY
STDEV.P
STDEV.S
T.DIST
T.DIST.2T
T.DIST.RT
T.INV
T.INV.2T
T.TEST
UNICHAR
UNICODE
UNIQUE
VAR.P
VAR.S
WEBSERVICE
WEIBULL.DIST
XLOOKUP
XOR
Z.TEST
Format Support (click to show)
Row Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS
Column Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM
Row and Column properties are not extracted by default when reading from a file and are not persisted by default when writing to a file. The option cellStyles: true
must be passed to the relevant read or write function.
Column Properties
The !cols
array in each worksheet, if present, is a collection of ColInfo
objects which have the following properties:
type ColInfo = {
/* visibility */
hidden?: boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?: number; // width in screen pixels
width?: number; // width in Excel's "Max Digit Width", width*256 is integral
wch?: number; // width in characters
/* other fields for preserving features from files */
level?: number; // 0-indexed outline / group level
MDW?: number; // Excel's "Max Digit Width" unit, always integral
};
Row Properties
The !rows
array in each worksheet, if present, is a collection of RowInfo
objects which have the following properties:
type RowInfo = {
/* visibility */
hidden?: boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?: number; // height in screen pixels
hpt?: number; // height in points
level?: number; // 0-indexed outline / group level
};
Outline / Group Levels Convention
The Excel UI displays the base outline level as 1
and the max level as 8
. Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base outline level is 0
and the max level is 7
.
Why are there three width types? (click to show)
There are three different width types corresponding to the three different ways spreadsheets store column widths:
SYLK and other plain text formats use raw character count. Contemporaneous tools like Visicalc and Multiplan were character based. Since the characters had the same width, it sufficed to store a count. This tradition was continued into the BIFF formats.
SpreadsheetML (2003) tried to align with HTML by standardizing on screen pixel count throughout the file. Column widths, row heights, and other measures use pixels. When the pixel and character counts do not align, Excel rounds values.
XLSX internally stores column widths in a nebulous "Max Digit Width" form. The Max Digit Width is the width of the largest digit when rendered (generally the "0" character is the widest). The internal width must be an integer multiple of the the width divided by 256. ECMA-376 describes a formula for converting between pixels and the internal width. This represents a hybrid approach.
Read functions attempt to populate all three properties. Write functions will try to cycle specified values to the desired type. In order to avoid potential conflicts, manipulation should delete the other properties first. For example, when changing the pixel width, delete the wch
and width
properties.
Implementation details (click to show)
Row Heights
Excel internally stores row heights in points. The default resolution is 72 DPI or 96 PPI, so the pixel and point size should agree. For different resolutions they may not agree, so the library separates the concepts.
Even though all of the information is made available, writers are expected to follow the priority order:
hpx
pixel height if availablehpt
point height if availableColumn Widths
Given the constraints, it is possible to determine the MDW without actually inspecting the font! The parsers guess the pixel width by converting from width to pixels and back, repeating for all possible MDW and selecting the MDW that minimizes the error. XLML actually stores the pixel width, so the guess works in the opposite direction.
Even though all of the information is made available, writers are expected to follow the priority order:
width
field if availablewpx
pixel width if availablewch
character count if availableThe cell.w
formatted text for each cell is produced from cell.v
and cell.z
format. If the format is not specified, the Excel General
format is used. The format can either be specified as a string or as an index into the format table. Parsers are expected to populate workbook.SSF
with the number format table. Writers are expected to serialize the table.
Custom tools should ensure that the local table has each used format string somewhere in the table. Excel convention mandates that the custom formats start at index 164. The following example creates a custom format from scratch:
New worksheet with custom format (click to show)
var wb = {
SheetNames: ["Sheet1"],
Sheets: {
Sheet1: {
"!ref":"A1:C1",
A1: { t:"n", v:10000 }, // <-- General format
B1: { t:"n", v:10000, z: "0%" }, // <-- Builtin format
C1: { t:"n", v:10000, z: "\"T\"\ #0.00" } // <-- Custom format
}
}
}
The rules are slightly different from how Excel displays custom number formats. In particular, literal characters must be wrapped in double quotes or preceded by a backslash. For more info, see the Excel documentation article Create or delete a custom number format
or ECMA-376 18.8.31 (Number Formats)
Default Number Formats (click to show)
The default formats are listed in ECMA-376 18.8.30:
ID | Format |
---|---|
0 | General |
1 | 0 |
2 | 0.00 |
3 | #,##0 |
4 | #,##0.00 |
9 | 0% |
10 | 0.00% |
11 | 0.00E+00 |
12 | # ?/? |
13 | # ??/?? |
14 | m/d/yy (see below) |
15 | d-mmm-yy |
16 | d-mmm |
17 | mmm-yy |
18 | h:mm AM/PM |
19 | h:mm:ss AM/PM |
20 | h:mm |
21 | h:mm:ss |
22 | m/d/yy h:mm |
37 | #,##0 ;(#,##0) |
38 | #,##0 ;[Red](#,##0) |
39 | #,##0.00;(#,##0.00) |
40 | #,##0.00;[Red](#,##0.00) |
45 | mm:ss |
46 | [h]:mm:ss |
47 | mmss.0 |
48 | ##0.0E+0 |
49 | @ |
Format 14 (m/d/yy
) is localized by Excel: even though the file specifies that number format, it will be drawn differently based on system settings. It makes sense when the producer and consumer of files are in the same locale, but that is not always the case over the Internet. To get around this ambiguity, parse functions accept the dateNF
option to override the interpretation of that specific format string.
Format Support (click to show)
Cell Hyperlinks: XLSX/M, XLSB, BIFF8 XLS, XLML, ODS
Tooltips: XLSX/M, XLSB, BIFF8 XLS, XLML
Hyperlinks are stored in the l
key of cell objects. The Target
field of the hyperlink object is the target of the link, including the URI fragment. Tooltips are stored in the Tooltip
field and are displayed when you move your mouse over the text.
For example, the following snippet creates a link from cell A3
to https://sheetjs.com with the tip "Find us @ SheetJS.com!"
:
ws['A1'].l = { Target:"https://sheetjs.com", Tooltip:"Find us @ SheetJS.com!" };
Note that Excel does not automatically style hyperlinks -- they will generally be displayed as normal text.
Remote Links
HTTP / HTTPS links can be used directly:
ws['A2'].l = { Target:"https://docs.sheetjs.com/#hyperlinks" };
ws['A3'].l = { Target:"http://localhost:7262/yes_localhost_works" };
Excel also supports mailto
email links with subject line:
ws['A4'].l = { Target:"mailto:ignored@dev.null" };
ws['A5'].l = { Target:"mailto:ignored@dev.null?subject=Test Subject" };
Local Links
Links to absolute paths should use the file://
URI scheme:
ws['B1'].l = { Target:"file:///SheetJS/t.xlsx" }; /* Link to /SheetJS/t.xlsx */
ws['B2'].l = { Target:"file:///c:/SheetJS.xlsx" }; /* Link to c:\SheetJS.xlsx */
Links to relative paths can be specified without a scheme:
ws['B3'].l = { Target:"SheetJS.xlsb" }; /* Link to SheetJS.xlsb */
ws['B4'].l = { Target:"../SheetJS.xlsm" }; /* Link to ../SheetJS.xlsm */
Relative Paths have undefined behavior in the SpreadsheetML 2003 format. Excel 2019 will treat a ..\
parent mark as two levels up.
Internal Links
Links where the target is a cell or range or defined name in the same workbook ("Internal Links") are marked with a leading hash character:
ws['C1'].l = { Target:"#E2" }; /* Link to cell E2 */
ws['C2'].l = { Target:"#Sheet2!E2" }; /* Link to cell E2 in sheet Sheet2 */
ws['C3'].l = { Target:"#SomeDefinedName" }; /* Link to Defined Name */
Cell comments are objects stored in the c
array of cell objects. The actual contents of the comment are split into blocks based on the comment author. The a
field of each comment object is the author of the comment and the t
field is the plain text representation.
For example, the following snippet appends a cell comment into cell A1
:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"I'm a little comment, short and stout!"});
Note: XLSB enforces a 54 character limit on the Author name. Names longer than 54 characters may cause issues with other formats.
To mark a comment as normally hidden, set the hidden
property:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This comment is visible"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This comment will be hidden"});
Threaded Comments
Introduced in Excel 365, threaded comments are plain text comment snippets with author metadata and parent references. They are supported in XLSX and XLSB.
To mark a comment as threaded, each comment part must have a true T
property:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This is not threaded"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This is threaded", T: true});
ws.A2.c.push({a:"JSSheet", t:"This is also threaded", T: true});
There is no Active Directory or Office 365 metadata associated with authors in a thread.
Excel enables hiding sheets in the lower tab bar. The sheet data is stored in the file but the UI does not readily make it available. Standard hidden sheets are revealed in the "Unhide" menu. Excel also has "very hidden" sheets which cannot be revealed in the menu. It is only accessible in the VB Editor!
The visibility setting is stored in the Hidden
property of sheet props array.
More details (click to show)
Value | Definition |
---|---|
0 | Visible |
1 | Hidden |
2 | Very Hidden |
With https://rawgit.com/SheetJS/test_files/HEAD/sheet_visibility.xlsx:
> wb.Workbook.Sheets.map(function(x) { return [x.name, x.Hidden] })
[ [ 'Visible', 0 ], [ 'Hidden', 1 ], [ 'VeryHidden', 2 ] ]
Non-Excel formats do not support the Very Hidden state. The best way to test if a sheet is visible is to check if the Hidden
property is logical truth:
> wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
[ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
VBA Macros are stored in a special data blob that is exposed in the vbaraw
property of the workbook object when the bookVBA
option is true
. They are supported in XLSM
, XLSB
, and BIFF8 XLS
formats. The supported format writers automatically insert the data blobs if it is present in the workbook and associate with the worksheet names.
Custom Code Names (click to show)
The workbook code name is stored in wb.Workbook.WBProps.CodeName
. By default, Excel will write ThisWorkbook
or a translated phrase like DieseArbeitsmappe
. Worksheet and Chartsheet code names are in the worksheet properties object at wb.Workbook.Sheets[i].CodeName
. Macrosheets and Dialogsheets are ignored.
The readers and writers preserve the code names, but they have to be manually set when adding a VBA blob to a different workbook.
Macrosheets (click to show)
Older versions of Excel also supported a non-VBA "macrosheet" sheet type that stored automation commands. These are exposed in objects with the !type
property set to "macro"
.
Detecting macros in workbooks (click to show)
The vbaraw
field will only be set if macros are present, so testing is simple:
function wb_has_macro(wb/*:workbook*/)/*:boolean*/ {
if(!!wb.vbaraw) return true;
const sheets = wb.SheetNames.map((n) => wb.Sheets[n]);
return sheets.some((ws) => !!ws && ws['!type']=='macro');
}
The exported read
and readFile
functions accept an options argument:
Option Name | Default | Description |
---|---|---|
type | Input data encoding (see Input Type below) | |
raw | false | If true, plain text parsing will not parse values ** |
codepage | If specified, use code page when appropriate ** | |
cellFormula | true | Save formulae to the .f field |
cellHTML | true | Parse rich text and save HTML to the .h field |
cellNF | false | Save number format string to the .z field |
cellStyles | false | Save style/theme info to the .s field |
cellText | true | Generated formatted text to the .w field |
cellDates | false | Store dates as type d (default is n ) |
dateNF | If specified, use the string for date code 14 ** | |
sheetStubs | false | Create cell objects of type z for stub cells |
sheetRows | 0 | If >0, read the first sheetRows rows ** |
bookDeps | false | If true, parse calculation chains |
bookFiles | false | If true, add raw files to book object ** |
bookProps | false | If true, only parse enough to get book metadata ** |
bookSheets | false | If true, only parse enough to get the sheet names |
bookVBA | false | If true, copy VBA blob to vbaraw field ** |
password | "" | If defined and file is encrypted, use password ** |
WTF | false | If true, throw errors on unexpected file features ** |
sheets | If specified, only parse specified sheets ** | |
PRN | false | If true, allow parsing of PRN files ** |
xlfn | false | If true, preserve _xlfn. prefixes in formulae ** |
FS | DSV Field Separator override |
cellNF
is false, formatted text will be generated and saved to .w
bookSheets
is false.raw
option suppresses value parsing.bookSheets
and bookProps
combine to give both sets of informationDeps
will be an empty object if bookDeps
is falsebookFiles
behavior depends on file type:keys
array (paths in the ZIP) for ZIP-based formatsfiles
hash (mapping paths to objects representing the files) for ZIPcfb
object for formats using CFB containerssheetRows-1
rows will be generated when looking at the JSON object output (since the header row is counted as a row when parsing the data)sheets
restricts based on input type:0
is first worksheet)bookVBA
merely exposes the raw VBA CFB object. It does not parse the data. XLSM and XLSB store the VBA CFB object in xl/vbaProject.bin
. BIFF8 XLS mixes the VBA entries alongside the core Workbook entry, so the library generates a new XLSB-compatible blob from the XLS CFB container.codepage
is applied to BIFF2 - BIFF5 files without CodePage
records and to CSV files without BOM in type:"binary"
. BIFF8 XLS always defaults to 1200.PRN
affects parsing of text files without a common delimiter character._xlfn.
prefix, hidden from the user. SheetJS will strip _xlfn.
normally. The xlfn
option preserves them.WTF:true
forces those errors to be thrown.Strings can be interpreted in multiple ways. The type
parameter for read
tells the library how to parse the data argument:
type | expected input |
---|---|
"base64" | string: Base64 encoding of the file |
"binary" | string: binary string (byte n is data.charCodeAt(n) ) |
"string" | string: JS string (characters interpreted as UTF8) |
"buffer" | nodejs Buffer |
"array" | array: array of 8-bit unsigned int (byte n is data[n] ) |
"file" | string: path of file that will be read (nodejs only) |
Implementation Details (click to show)
Excel and other spreadsheet tools read the first few bytes and apply other heuristics to determine a file type. This enables file type punning: renaming files with the .xls
extension will tell your computer to use Excel to open the file but Excel will know how to handle it. This library applies similar logic:
Byte 0 | Raw File Type | Spreadsheet Types |
---|---|---|
0xD0 | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR |
0x09 | BIFF Stream | BIFF 2/3/4/5 |
0x3C | XML/HTML | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x50 | ZIP Archive | XLSB or XLSX/M or ODS or UOS2 or NUMBERS or text |
0x49 | Plain Text | SYLK or plain text |
0x54 | Plain Text | DIF or plain text |
0xEF | UTF8 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0xFF | UTF16 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x00 | Record Stream | Lotus WK* or Quattro Pro or plain text |
0x7B | Plain text | RTF or plain text |
0x0A | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x0D | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x20 | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
DBF files are detected based on the first byte as well as the third and fourth bytes (corresponding to month and day of the file date)
Works for Windows files are detected based on the BOF record with type 0xFF
Plain text format guessing follows the priority order:
Format | Test |
---|---|
XML | <?xml appears in the first 1024 characters |
HTML | starts with < and HTML tags appear in the first 1024 characters * |
XML | starts with < and the first tag is valid |
RTF | starts with {\rt |
DSV | starts with /sep=.$/ , separator is the specified character |
DSV | more unquoted ` |
DSV | more unquoted ; chars than \t or , in the first 1024 |
TSV | more unquoted \t chars than , chars in the first 1024 |
CSV | one of the first 1024 characters is a comma "," |
ETH | starts with socialcalc:version: |
PRN | PRN option is set to true |
CSV | (fallback) |
html
, table
, head
, meta
, script
, style
, div
Why are random text files valid? (click to show)
Excel is extremely aggressive in reading files. Adding an XLS extension to any display text file (where the only characters are ANSI display chars) tricks Excel into thinking that the file is potentially a CSV or TSV file, even if it is only one column! This library attempts to replicate that behavior.
The best approach is to validate the desired worksheet and ensure it has the expected number of rows or columns. Extracting the range is extremely simple:
var range = XLSX.utils.decode_range(worksheet['!ref']);
var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1;
The exported write
and writeFile
functions accept an options argument:
Option Name | Default | Description |
---|---|---|
type | Output data encoding (see Output Type below) | |
cellDates | false | Store dates as type d (default is n ) |
bookSST | false | Generate Shared String Table ** |
bookType | "xlsx" | Type of Workbook (see below for supported formats) |
sheet | "" | Name of Worksheet for single-sheet formats ** |
compression | false | Use ZIP compression for ZIP-based formats ** |
Props | Override workbook properties when writing ** | |
themeXLSX | Override theme XML when writing XLSX/XLSB/XLSM ** | |
ignoreEC | true | Suppress "number as text" errors ** |
numbers | Payload for NUMBERS export ** |
bookSST
is slower and more memory intensive, but has better compatibility with older versions of iOS NumberscellDates
only applies to XLSX output and is not guaranteed to work with third-party readers. Excel itself does not usually write cells with type d
so non-Excel tools may ignore the data or error in the presence of dates.Props
is an object mirroring the workbook Props
field. See the table from the Workbook File Properties section.themeXLSX
will be saved as the primary theme for XLSX/XLSB/XLSM files (to xl/theme/theme1.xml
in the ZIP)ignoreEC
to false
to suppress.xlsx.zahl.js
and xlsx.zahl.mjs
scripts include the data.For broad compatibility with third-party tools, this library supports many output formats. The specific file type is controlled with bookType
option:
bookType | file ext | container | sheets | Description |
---|---|---|---|---|
xlsx | .xlsx | ZIP | multi | Excel 2007+ XML Format |
xlsm | .xlsm | ZIP | multi | Excel 2007+ Macro XML Format |
xlsb | .xlsb | ZIP | multi | Excel 2007+ Binary Format |
biff8 | .xls | CFB | multi | Excel 97-2004 Workbook Format |
biff5 | .xls | CFB | multi | Excel 5.0/95 Workbook Format |
biff4 | .xls | none | single | Excel 4.0 Worksheet Format |
biff3 | .xls | none | single | Excel 3.0 Worksheet Format |
biff2 | .xls | none | single | Excel 2.0 Worksheet Format |
xlml | .xls | none | multi | Excel 2003-2004 (SpreadsheetML) |
numbers | .numbers | ZIP | single | Numbers 3.0+ Spreadsheet |
ods | .ods | ZIP | multi | OpenDocument Spreadsheet |
fods | .fods | none | multi | Flat OpenDocument Spreadsheet |
wk3 | .wk3 | none | multi | Lotus Workbook (WK3) |
csv | .csv | none | single | Comma Separated Values |
txt | .txt | none | single | UTF-16 Unicode Text (TXT) |
sylk | .sylk | none | single | Symbolic Link (SYLK) |
html | .html | none | single | HTML Document |
dif | .dif | none | single | Data Interchange Format (DIF) |
dbf | .dbf | none | single | dBASE II + VFP Extensions (DBF) |
wk1 | .wk1 | none | single | Lotus Worksheet (WK1) |
rtf | .rtf | none | single | Rich Text Format (RTF) |
prn | .prn | none | single | Lotus Formatted Text |
eth | .eth | none | single | Ethercalc Record Format (ETH) |
compression
only applies to formats with ZIP containers.sheet
option specifying the worksheet. If the string is empty, the first worksheet is used.writeFile
will automatically guess the output file format based on the file extension if bookType
is not specified. It will choose the first format in the aforementioned table that matches the extension.The type
argument for write
mirrors the type
argument for read
:
type | output |
---|---|
"base64" | string: Base64 encoding of the file |
"binary" | string: binary string (byte n is data.charCodeAt(n) ) |
"string" | string: JS string (characters interpreted as UTF8) |
"buffer" | nodejs Buffer |
"array" | ArrayBuffer, fallback array of 8-bit unsigned int |
"file" | string: path of file that will be created (nodejs only) |
csv
output will always include the UTF-8 byte order mark.The sheet_to_*
functions accept a worksheet and an optional options object.
The *_to_sheet
functions accept a data object and an optional options object.
The examples are based on the following worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
3 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
XLSX.utils.aoa_to_sheet
takes an array of arrays of JS values and returns a worksheet resembling the input data. Numbers, Booleans and Strings are stored as the corresponding styles. Dates are stored as date or numbers. Array holes and explicit undefined
values are skipped. null
values may be stubbed. All other values are stored as strings. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetStubs | false | Create cell objects of type z for null values |
nullError | false | If true, emit #NULL! error cells for null values |
Examples (click to show)
To generate the example sheet:
var ws = XLSX.utils.aoa_to_sheet([
"SheetJS".split(""),
[1,2,3,4,5,6,7],
[2,3,4,5,6,7,8]
]);
XLSX.utils.sheet_add_aoa
takes an array of arrays of JS values and updates an existing worksheet object. It follows the same process as aoa_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetStubs | false | Create cell objects of type z for null values |
nullError | false | If true, emit #NULL! error cells for null values |
origin | Use specified cell as starting point (see below) |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
Consider the worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | | | 5 | 6 | 7 |
3 | 2 | 3 | | | 6 | 7 | 8 |
4 | 3 | 4 | | | 7 | 8 | 9 |
5 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
This worksheet can be built up in the order A1:G1, A2:B4, E2:G4, A5:G5
:
/* Initial row */
var ws = XLSX.utils.aoa_to_sheet([ "SheetJS".split("") ]);
/* Write data starting at A2 */
XLSX.utils.sheet_add_aoa(ws, [[1,2], [2,3], [3,4]], {origin: "A2"});
/* Write data starting at E2 */
XLSX.utils.sheet_add_aoa(ws, [[5,6,7], [6,7,8], [7,8,9]], {origin:{r:1, c:4}});
/* Append row */
XLSX.utils.sheet_add_aoa(ws, [[4,5,6,7,8,9,0]], {origin: -1});
XLSX.utils.json_to_sheet
takes an array of objects and returns a worksheet with automatically-generated "headers" based on the keys of the objects. The default column order is determined by the first appearance of the field using Object.keys
. The function accepts an options argument:
Option Name | Default | Description |
---|---|---|
header | Use specified field order (default Object.keys ) ** | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
skipHeader | false | If true, do not include header row in output |
nullError | false | If true, emit #NULL! error cells for null values |
header
is an array and it does not contain a particular field, the key will be appended to the array.Date
object will generate a Date cell, while a string will generate a Text cell.nullError
is true, an error cell corresponding to #NULL!
will be written to the worksheet.Examples (click to show)
The original sheet cannot be reproduced using plain objects since JS object keys must be unique. After replacing the second e
and S
with e_1
and S_1
:
var ws = XLSX.utils.json_to_sheet([
{ S:1, h:2, e:3, e_1:4, t:5, J:6, S_1:7 },
{ S:2, h:3, e:4, e_1:5, t:6, J:7, S_1:8 }
], {header:["S","h","e","e_1","t","J","S_1"]});
Alternatively, the header row can be skipped:
var ws = XLSX.utils.json_to_sheet([
{ A:"S", B:"h", C:"e", D:"e", E:"t", F:"J", G:"S" },
{ A: 1, B: 2, C: 3, D: 4, E: 5, F: 6, G: 7 },
{ A: 2, B: 3, C: 4, D: 5, E: 6, F: 7, G: 8 }
], {header:["A","B","C","D","E","F","G"], skipHeader:true});
XLSX.utils.sheet_add_json
takes an array of objects and updates an existing worksheet object. It follows the same process as json_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
header | Use specified column order (default Object.keys ) | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
skipHeader | false | If true, do not include header row in output |
nullError | false | If true, emit #NULL! error cells for null values |
origin | Use specified cell as starting point (see below) |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
Consider the worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | | | 5 | 6 | 7 |
3 | 2 | 3 | | | 6 | 7 | 8 |
4 | 3 | 4 | | | 7 | 8 | 9 |
5 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
This worksheet can be built up in the order A1:G1, A2:B4, E2:G4, A5:G5
:
/* Initial row */
var ws = XLSX.utils.json_to_sheet([
{ A: "S", B: "h", C: "e", D: "e", E: "t", F: "J", G: "S" }
], {header: ["A", "B", "C", "D", "E", "F", "G"], skipHeader: true});
/* Write data starting at A2 */
XLSX.utils.sheet_add_json(ws, [
{ A: 1, B: 2 }, { A: 2, B: 3 }, { A: 3, B: 4 }
], {skipHeader: true, origin: "A2"});
/* Write data starting at E2 */
XLSX.utils.sheet_add_json(ws, [
{ A: 5, B: 6, C: 7 }, { A: 6, B: 7, C: 8 }, { A: 7, B: 8, C: 9 }
], {skipHeader: true, origin: { r: 1, c: 4 }, header: [ "A", "B", "C" ]});
/* Append row */
XLSX.utils.sheet_add_json(ws, [
{ A: 4, B: 5, C: 6, D: 7, E: 8, F: 9, G: 0 }
], {header: ["A", "B", "C", "D", "E", "F", "G"], skipHeader: true, origin: -1});
XLSX.utils.table_to_sheet
takes a table DOM element and returns a worksheet resembling the input table. Numbers are parsed. All other data will be stored as strings.
XLSX.utils.table_to_book
produces a minimal workbook based on the worksheet.
Both functions accept options arguments:
Option Name | Default | Description |
---|---|---|
raw | If true, every cell will hold raw strings | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetRows | 0 | If >0, read the first sheetRows rows of the table |
display | false | If true, hidden rows and cells will not be parsed |
Examples (click to show)
To generate the example sheet, start with the HTML table:
<table id="sheetjs">
<tr><td>S</td><td>h</td><td>e</td><td>e</td><td>t</td><td>J</td><td>S</td></tr>
<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td></tr>
<tr><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td></tr>
</table>
To process the table:
var tbl = document.getElementById('sheetjs');
var wb = XLSX.utils.table_to_book(tbl);
Note: XLSX.read
can handle HTML represented as strings.
XLSX.utils.sheet_add_dom
takes a table DOM element and updates an existing worksheet object. It follows the same process as table_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
raw | If true, every cell will hold raw strings | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetRows | 0 | If >0, read the first sheetRows rows of the table |
display | false | If true, hidden rows and cells will not be parsed |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
A small helper function can create gap rows between tables:
function create_gap_rows(ws, nrows) {
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
ref.e.r += nrows; // add to ending row
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
}
/* first table */
var ws = XLSX.utils.table_to_sheet(document.getElementById('table1'));
create_gap_rows(ws, 1); // one row gap after first table
/* second table */
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
create_gap_rows(ws, 3); // three rows gap after second table
/* third table */
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
XLSX.utils.sheet_to_formulae
generates an array of commands that represent how a person would enter data into an application. Each entry is of the form A1-cell-address=formula-or-value
. String literals are prefixed with a '
in accordance with Excel.
Examples (click to show)
For the example sheet:
> var o = XLSX.utils.sheet_to_formulae(ws);
> [o[0], o[5], o[10], o[15], o[20]];
[ 'A1=\'S', 'F1=\'J', 'D2=4', 'B3=3', 'G3=8' ]
As an alternative to the writeFile
CSV type, XLSX.utils.sheet_to_csv
also produces CSV output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
FS | "," | "Field Separator" delimiter between fields |
RS | "\n" | "Record Separator" delimiter between rows |
dateNF | FMT 14 | Use specified date format in string output |
strip | false | Remove trailing field separators in each record ** |
blankrows | true | Include blank lines in the CSV output |
skipHidden | false | Skips hidden rows/columns in the CSV output |
forceQuotes | false | Force quotes around fields |
strip
will remove trailing commas from each line under default FS/RS
blankrows
must be set to false
to skip blank lines.forceQuotes
forces all cells to be wrapped in quotes.XLSX.write
with csv
type will always prepend the UTF-8 byte-order mark for Excel compatibility. sheet_to_csv
returns a JS string and omits the mark. Using XLSX.write
with type string
will also skip the mark.Examples (click to show)
For the example sheet:
> console.log(XLSX.utils.sheet_to_csv(ws));
S,h,e,e,t,J,S
1,2,3,4,5,6,7
2,3,4,5,6,7,8
> console.log(XLSX.utils.sheet_to_csv(ws, {FS:"\t"}));
S h e e t J S
1 2 3 4 5 6 7
2 3 4 5 6 7 8
> console.log(XLSX.utils.sheet_to_csv(ws,{FS:":",RS:"|"}));
S:h:e:e:t:J:S|1:2:3:4:5:6:7|2:3:4:5:6:7:8|
The txt
output type uses the tab character as the field separator. If the codepage
library is available (included in full distribution but not core), the output will be encoded in CP1200
and the BOM will be prepended.
XLSX.utils.sheet_to_txt
takes the same arguments as sheet_to_csv
.
As an alternative to the writeFile
HTML type, XLSX.utils.sheet_to_html
also produces HTML output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
id | Specify the id attribute for the TABLE element | |
editable | false | If true, set contenteditable="true" for every TD |
header | Override header (default html body ) | |
footer | Override footer (default /body /html ) |
Examples (click to show)
For the example sheet:
> console.log(XLSX.utils.sheet_to_html(ws));
// ...
XLSX.utils.sheet_to_json
generates different types of JS objects. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
raw | true | Use raw values (true) or formatted strings (false) |
range | from WS | Override Range (see table below) |
header | Control output format (see table below) | |
dateNF | FMT 14 | Use specified date format in string output |
defval | Use specified value in place of null or undefined | |
blankrows | ** | Include blank lines in the output ** |
raw
only affects cells which have a format code (.z
) field or a formatted text (.w
) field.header
is specified, the first row is considered a data row; if header
is not specified, the first row is the header row and not considered data.header
is not specified, the conversion will automatically disambiguate header entries by affixing _
and a count starting at 1
. For example, if three columns have header foo
the output fields are foo
, foo_1
, foo_2
null
values are returned when raw
is true but are skipped when false.defval
is not specified, null and undefined values are skipped normally. If specified, all null and undefined points will be filled with defval
header
is 1
, the default is to generate blank rows. blankrows
must be set to false
to skip blank rows.header
is not 1
, the default is to skip blank rows. blankrows
must be true to generate blank rowsrange
is expected to be one of:
range | Description |
---|---|
(number) | Use worksheet range but set starting row to the value |
(string) | Use specified range (A1-style bounded range string) |
(default) | Use worksheet range (ws['!ref'] ) |
header
is expected to be one of:
header | Description |
---|---|
1 | Generate an array of arrays ("2D Array") |
"A" | Row object keys are literal column labels |
array of strings | Use specified strings as keys in row objects |
(default) | Read and disambiguate first row as keys |
1
, the row object will contain the non-enumerable property __rowNum__
that represents the row of the sheet corresponding to the entry.Examples (click to show)
For the example sheet:
> XLSX.utils.sheet_to_json(ws);
[ { S: 1, h: 2, e: 3, e_1: 4, t: 5, J: 6, S_1: 7 },
{ S: 2, h: 3, e: 4, e_1: 5, t: 6, J: 7, S_1: 8 } ]
> XLSX.utils.sheet_to_json(ws, {header:"A"});
[ { A: 'S', B: 'h', C: 'e', D: 'e', E: 't', F: 'J', G: 'S' },
{ A: '1', B: '2', C: '3', D: '4', E: '5', F: '6', G: '7' },
{ A: '2', B: '3', C: '4', D: '5', E: '6', F: '7', G: '8' } ]
> XLSX.utils.sheet_to_json(ws, {header:["A","E","I","O","U","6","9"]});
[ { '6': 'J', '9': 'S', A: 'S', E: 'h', I: 'e', O: 'e', U: 't' },
{ '6': '6', '9': '7', A: '1', E: '2', I: '3', O: '4', U: '5' },
{ '6': '7', '9': '8', A: '2', E: '3', I: '4', O: '5', U: '6' } ]
> XLSX.utils.sheet_to_json(ws, {header:1});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ '1', '2', '3', '4', '5', '6', '7' ],
[ '2', '3', '4', '5', '6', '7', '8' ] ]
Example showing the effect of raw
:
> ws['A2'].w = "3"; // set A2 formatted string value
> XLSX.utils.sheet_to_json(ws, {header:1, raw:false});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ '3', '2', '3', '4', '5', '6', '7' ], // <-- A2 uses the formatted string
[ '2', '3', '4', '5', '6', '7', '8' ] ]
> XLSX.utils.sheet_to_json(ws, {header:1});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ 1, 2, 3, 4, 5, 6, 7 ], // <-- A2 uses the raw value
[ 2, 3, 4, 5, 6, 7, 8 ] ]
Despite the library name xlsx
, it supports numerous spreadsheet file formats:
Format | Read | Write |
---|---|---|
Excel Worksheet/Workbook Formats | :-----: | :-----: |
Excel 2007+ XML Formats (XLSX/XLSM) | ✔ | ✔ |
Excel 2007+ Binary Format (XLSB BIFF12) | ✔ | ✔ |
Excel 2003-2004 XML Format (XML "SpreadsheetML") | ✔ | ✔ |
Excel 97-2004 (XLS BIFF8) | ✔ | ✔ |
Excel 5.0/95 (XLS BIFF5) | ✔ | ✔ |
Excel 4.0 (XLS/XLW BIFF4) | ✔ | ✔ |
Excel 3.0 (XLS BIFF3) | ✔ | ✔ |
Excel 2.0/2.1 (XLS BIFF2) | ✔ | ✔ |
Excel Supported Text Formats | :-----: | :-----: |
Delimiter-Separated Values (CSV/TXT) | ✔ | ✔ |
Data Interchange Format (DIF) | ✔ | ✔ |
Symbolic Link (SYLK/SLK) | ✔ | ✔ |
Lotus Formatted Text (PRN) | ✔ | ✔ |
UTF-16 Unicode Text (TXT) | ✔ | ✔ |
Other Workbook/Worksheet Formats | :-----: | :-----: |
Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS) | ✔ | ✔ |
OpenDocument Spreadsheet (ODS) | ✔ | ✔ |
Flat XML ODF Spreadsheet (FODS) | ✔ | ✔ |
Uniform Office Format Spreadsheet (标文通 UOS1/UOS2) | ✔ | |
dBASE II/III/IV / Visual FoxPro (DBF) | ✔ | ✔ |
Lotus 1-2-3 (WK1/WK3) | ✔ | ✔ |
Lotus 1-2-3 (WKS/WK2/WK4/123) | ✔ | |
Quattro Pro Spreadsheet (WQ1/WQ2/WB1/WB2/WB3/QPW) | ✔ | |
Works 1.x-3.x DOS / 2.x-5.x Windows Spreadsheet (WKS) | ✔ | |
Works 6.x-9.x Spreadsheet (XLR) | ✔ | |
Other Common Spreadsheet Output Formats | :-----: | :-----: |
HTML Tables | ✔ | ✔ |
Rich Text Format tables (RTF) | ✔ | |
Ethercalc Record Format (ETH) | ✔ | ✔ |
Features not supported by a given file format will not be written. Formats with range limits will be silently truncated:
Format | Last Cell | Max Cols | Max Rows |
---|---|---|---|
Excel 2007+ XML Formats (XLSX/XLSM) | XFD1048576 | 16384 | 1048576 |
Excel 2007+ Binary Format (XLSB BIFF12) | XFD1048576 | 16384 | 1048576 |
Numbers 12.0 (NUMBERS) | ALL1000000 | 1000 | 1000000 |
Excel 97-2004 (XLS BIFF8) | IV65536 | 256 | 65536 |
Excel 5.0/95 (XLS BIFF5) | IV16384 | 256 | 16384 |
Excel 4.0 (XLS BIFF4) | IV16384 | 256 | 16384 |
Excel 3.0 (XLS BIFF3) | IV16384 | 256 | 16384 |
Excel 2.0/2.1 (XLS BIFF2) | IV16384 | 256 | 16384 |
Lotus 1-2-3 R2 - R5 (WK1/WK3/WK4) | IV8192 | 256 | 8192 |
Lotus 1-2-3 R1 (WKS) | IV2048 | 256 | 2048 |
Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.
File Format Details (click to show)
Core Spreadsheet Formats
XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). The XLSM format, almost identical to XLSX, is used for files containing macros.
The format is standardized in ECMA-376 and later in ISO/IEC 29500. Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.
BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced the concept of a workbook (XLW
files) but also had single-sheet XLS
format. The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12 extended the format in various ways but largely stuck to the same record format.
There is no official specification for any of these formats. Excel 95 can write files in these formats, so record lengths and fields were determined by writing in all of the supported formats and comparing files. Excel 2016 can generate BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.
BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. At its core, it still uses an extended version of the binary record format from older versions of BIFF.
The MS-XLS
specification covers the basics of the file format, and other specifications expand on serialization of features like properties.
Predating XLSX, SpreadsheetML files are simple XML files. There is no official and comprehensive specification, although MS has released documentation on the format. Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.
Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
The MS-XLSB
specification covers the basics of the file format, and other specifications expand on serialization of features like properties.
Excel CSV deviates from RFC4180 in a number of important ways. The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. The parser should generally understand Excel CSV. The writer proactively generates cells for formulae if values are unavailable.
Excel TXT uses tab as the delimiter and code page 1200.
Like in Excel, files starting with 0x49 0x44 ("ID")
are treated as Symbolic Link files. Unlike Excel, if the file does not have a valid SYLK header, it will be proactively reinterpreted as CSV. There are some files with semicolon delimiter that align with a valid SYLK file. For the broadest compatibility, all cells with the value of ID
are automatically wrapped in double-quotes.
Miscellaneous Workbook Formats
Support for other formats is generally far behind XLS/XLSB/XLSX support, due in part to a lack of publicly available documentation. Test files were produced in the respective apps and compared to their XLS exports to determine structure. The main focus is data extraction.
The Lotus formats consist of binary records similar to the BIFF structure. Lotus did release a specification decades ago covering the original WK1 format. Other features were deduced by producing files and comparing to Excel support.
Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.
Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.
The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.
All versions of Works were limited to a single worksheet.
Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.
Works for Windows 3.x - 5.x uses the same format and WKS extension. The BOF record has type FF
Works for Windows 6.x - 9.x use the XLR format. XLR is nearly identical to BIFF8 XLS: it uses the CFB container with a Workbook stream. Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. XLR also includes a WksSSWorkBook
stream similar to Lotus FM3/FMT files.
iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary XML-based format to the current file format based on the iWork Archive (IWA). This format has been used up through the current release (Numbers 11.2).
The parser focuses on extracting raw data from tables. Numbers technically supports multiple tables in a logical worksheet, including custom titles. This parser will generate one worksheet per Numbers table.
The writer currently exports a small range from the first worksheet.
ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. Both are detailed in the OASIS standard, but tools like LO/OO add undocumented extensions. The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.
UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. For the most part, the difference between the formats is in the names of tags and attributes.
Miscellaneous Worksheet Formats
Many older formats supported only one worksheet:
DBF is really a typed table format: each column can only hold one data type and each record omits type information. The parser generates a header row and inserts records starting at the second row of the worksheet. The writer makes files compatible with Visual FoxPro extensions.
Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. The reader understands DBF Level 7 extensions like DATETIME.
There is no real documentation. All knowledge was gathered by saving files in various versions of Excel to deduce the meaning of fields. Notes:
Plain formulae are stored in the RC form.
Column widths are rounded to integral characters.
Lotus Formatted Text (PRN)
There is no real documentation, and in fact Excel treats PRN as an output-only file format. Nevertheless we can guess the column widths and reverse-engineer the original layout. Excel's 240 character width limitation is not enforced.
There is no unified definition. Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Where ambiguous, the parser/writer follows the expected behavior from Excel. In particular, Excel extends DIF in incompatible ways:
Since Excel automatically converts numbers-as-strings to numbers, numeric string constants are converted to formulae: "0.3" -> "=""0.3""
DIF technically expects numeric cells to hold the raw numeric data, but Excel permits formatted numbers (including dates)
DIF technically has no support for formulae, but Excel will automatically convert plain formulae. Array formulae are not preserved.
HTML
Excel HTML worksheets include special metadata encoded in styles. For example, mso-number-format
is a localized string containing the number format. Despite the metadata the output is valid HTML, although it does accept bare &
symbols.
The writer adds type metadata to the TD elements via the t
tag. The parser looks for those tags and overrides the default interpretation. For example, text like <td>12345</td>
will be parsed as numbers but <td t="s">12345</td>
will be parsed as text.
Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. The supported codes are a subset of the Word RTF support.
Ethercalc is an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.
(click to show)
make test
will run the node-based tests. By default it runs tests on files in every supported format. To test a specific file type, set FMTS
to the format you want to test. Feature-specific tests are available with make test_misc
$ make test_misc # run core tests
$ make test # run full tests
$ make test_xls # only use the XLS test files
$ make test_xlsx # only use the XLSX test files
$ make test_xlsb # only use the XLSB test files
$ make test_xml # only use the XML test files
$ make test_ods # only use the ODS test files
To enable all errors, set the environment variable WTF=1
:
$ make test # run full tests
$ WTF=1 make test # enable all error messages
flow
and eslint
checks are available:
$ make lint # eslint checks
$ make flow # make lint + Flow checking
$ make tslint # check TS definitions
(click to show)
The core in-browser tests are available at tests/index.html
within this repo. Start a local server and navigate to that directory to run the tests. make ctestserv
will start a server on port 8000.
make ctest
will generate the browser fixtures. To add more files, edit the tests/fixtures.lst
file and add the paths.
To run the full in-browser tests, clone the repo for oss.sheetjs.com
and replace the xlsx.js
file (then open a browser window and go to stress.html
):
$ cp xlsx.js ../SheetJS.github.io
$ cd ../SheetJS.github.io
$ simplehttpserver # or "python -mSimpleHTTPServer" or "serve"
$ open -a Chromium.app http://localhost:8000/stress.html
(click to show)
0.8
, 0.10
, 0.12
, 4.x
, 5.x
, 6.x
, 7.x
, 8.x
Tests utilize the mocha testing framework.
The test suite also includes tests for various time zones. To change the timezone locally, set the TZ environment variable:
$ env TZ="Asia/Kolkata" WTF=1 make test_misc
Test files are housed in another repo.
Running make init
will refresh the test_files
submodule and get the files. Note that this requires svn
, git
, hg
and other commands that may not be available. If make init
fails, please download the latest version of the test files snapshot from the repo
Latest Snapshot (click to show)
Latest test files snapshot: http://github.com/SheetJS/test_files/releases/download/20170409/test_files.zip
(download and unzip to the test_files
subdirectory)
Due to the precarious nature of the Open Specifications Promise, it is very important to ensure code is cleanroom. Contribution Notes
File organization (click to show)
At a high level, the final script is a concatenation of the individual files in the bits
folder. Running make
should reproduce the final output on all platforms. The README is similarly split into bits in the docbits
folder.
Folders:
folder | contents |
---|---|
bits | raw source files that make up the final script |
docbits | raw markdown files that make up README.md |
bin | server-side bin scripts (xlsx.njs ) |
dist | dist files for web browsers and nonstandard JS environments |
demos | demo projects for platforms like ExtendScript and Webpack |
tests | browser tests (run make ctest to rebuild) |
types | typescript definitions and tests |
misc | miscellaneous supporting scripts |
test_files | test files (pulled from the test files repository) |
After cloning the repo, running make help
will display a list of commands.
(click to show)
The xlsx.js
file is constructed from the files in the bits
subdirectory. The build script (run make
) will concatenate the individual bits to produce the script. Before submitting a contribution, ensure that running make will produce the xlsx.js
file exactly. The simplest way to test is to add the script:
$ git add xlsx.js
$ make clean
$ make
$ git diff xlsx.js
To produce the dist files, run make dist
. The dist files are updated in each version release and should not be committed between versions.
(click to show)
The included make.cmd
script will build xlsx.js
from the bits
directory. Building is as simple as:
> make
To prepare development environment:
> make init
The full list of commands available in Windows are displayed in make help
:
make init -- install deps and global modules
make lint -- run eslint linter
make test -- run mocha test suite
make misc -- run smaller test suite
make book -- rebuild README and summary
make help -- display this message
As explained in Test Files, on Windows the release ZIP file must be downloaded and extracted. If Bash on Windows is available, it is possible to run the OSX/Linux workflow. The following steps prepares the environment:
# Install support programs for the build and test commands
sudo apt-get install make git subversion mercurial
# Install nodejs and NPM within the WSL
wget -qO- https://deb.nodesource.com/setup_8.x | sudo bash
sudo apt-get install nodejs
# Install dev dependencies
sudo npm install -g mocha voc blanket xlsjs
(click to show)
The test_misc
target (make test_misc
on Linux/OSX / make misc
on Windows) runs the targeted feature tests. It should take 5-10 seconds to perform feature tests without testing against the entire test battery. New features should be accompanied with tests for the relevant file formats and features.
For tests involving the read side, an appropriate feature test would involve reading an existing file and checking the resulting workbook object. If a parameter is involved, files should be read with different values to verify that the feature is working as expected.
For tests involving a new write feature which can already be parsed, appropriate feature tests would involve writing a workbook with the feature and then opening and verifying that the feature is preserved.
For tests involving a new write feature without an existing read ability, please add a feature test to the kitchen sink tests/write.js
.
OSP-covered Specifications (click to show)
MS-CFB
: Compound File Binary File FormatMS-CTXLS
: Excel Custom Toolbar Binary File FormatMS-EXSPXML3
: Excel Calculation Version 2 Web Service XML SchemaMS-ODATA
: Open Data Protocol (OData)MS-ODRAW
: Office Drawing Binary File FormatMS-ODRAWXML
: Office Drawing Extensions to Office Open XML StructureMS-OE376
: Office Implementation Information for ECMA-376 Standards SupportMS-OFFCRYPTO
: Office Document Cryptography StructureMS-OI29500
: Office Implementation Information for ISO/IEC 29500 Standards SupportMS-OLEDS
: Object Linking and Embedding (OLE) Data StructuresMS-OLEPS
: Object Linking and Embedding (OLE) Property Set Data StructuresMS-OODF3
: Office Implementation Information for ODF 1.2 Standards SupportMS-OSHARED
: Office Common Data Types and Objects StructuresMS-OVBA
: Office VBA File Format StructureMS-XLDM
: Spreadsheet Data Model File FormatMS-XLS
: Excel Binary File Format (.xls) Structure SpecificationMS-XLSB
: Excel (.xlsb) Binary File FormatMS-XLSX
: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File FormatXLS
: Microsoft Office Excel 97-2007 Binary File Format SpecificationRTF
: Rich Text FormatBrowser Test and Support Matrix
Supported File Formats
Author: SheetJS
Source Code: https://github.com/SheetJS/sheetjs
License: Apache-2.0 License
1644415980
The SheetJS Community Edition offers battle-tested open-source solutions for extracting useful data from almost any complex spreadsheet and generating new spreadsheets that will work with legacy and modern software alike.
SheetJS Pro offers solutions beyond data processing: Edit complex templates with ease; let out your inner Picasso with styling; make custom sheets with images/graphs/PivotTables; evaluate formula expressions and port calculations to web apps; automate common spreadsheet tasks, and much more!
Browser Test and Support Matrix
Supported File Formats
Diagram Legend (click to show)
Expand to show Table of Contents
The complete browser standalone build is saved to dist/xlsx.full.min.js
and can be directly added to a page with a script
tag:
<script lang="javascript" src="dist/xlsx.full.min.js"></script>
CDN Availability (click to show)
CDN | URL |
---|---|
unpkg | https://unpkg.com/xlsx/ |
jsDelivr | https://jsdelivr.com/package/npm/xlsx |
CDNjs | https://cdnjs.com/libraries/xlsx |
packd | https://bundle.run/xlsx@latest?name=XLSX |
For example, unpkg
makes the latest version available at:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
Browser builds (click to show)
The complete single-file version is generated at dist/xlsx.full.min.js
A slimmer build is generated at dist/xlsx.mini.min.js
. Compared to full build:
Webpack and Browserify builds include optional modules by default. Webpack can be configured to remove support with resolve.alias
:
/* uncomment the lines below to remove support */
resolve: {
alias: { "./dist/cpexcel.js": "" } // <-- omit international support
}
With npm:
$ npm install xlsx
With bower:
$ bower install js-xlsx
dist/xlsx.extendscript.js
is an ExtendScript build for Photoshop and InDesign that is included in the npm
package. It can be directly referenced with a #include
directive:
#include "xlsx.extendscript.js"
Internet Explorer and ECMAScript 3 Compatibility (click to show)
For broad compatibility with JavaScript engines, the library is written using ECMAScript 3 language dialect as well as some ES5 features like Array#forEach
. Older browsers require shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js
:
<!-- add the shim first -->
<script type="text/javascript" src="shim.min.js"></script>
<!-- after the shim is referenced, add the library -->
<script type="text/javascript" src="xlsx.full.min.js"></script>
The script also includes IE_LoadFile
and IE_SaveFile
for loading and saving files in Internet Explorer versions 6-9. The xlsx.extendscript.js
script bundles the shim in a format suitable for Photoshop and other Adobe products.
Most scenarios involving spreadsheets and data can be broken into 5 parts:
Acquire Data: Data may be stored anywhere: local or remote files, databases, HTML TABLE, or even generated programmatically in the web browser.
Extract Data: For spreadsheet files, this involves parsing raw bytes to read the cell data. For general JS data, this involves reshaping the data.
Process Data: From generating summary statistics to cleaning data records, this step is the heart of the problem.
Package Data: This can involve making a new spreadsheet or serializing with JSON.stringify
or writing XML or simply flattening data for UI tools.
Release Data: Spreadsheet files can be uploaded to a server or written locally. Data can be presented to users in an HTML TABLE or data grid.
A common problem involves generating a valid spreadsheet export from data stored in an HTML table. In this example, an HTML TABLE on the page will be scraped, a row will be added to the bottom with the date of the report, and a new file will be generated and downloaded locally. XLSX.writeFile
takes care of packaging the data and attempting a local download:
// Acquire Data (reference to the HTML table)
var table_elt = document.getElementById("my-table-id");
// Extract Data (create a workbook object from the table)
var workbook = XLSX.utils.table_to_book(table_elt);
// Process Data (add a new row)
var ws = workbook.Sheets["Sheet1"];
XLSX.utils.sheet_add_aoa(ws, [["Created "+new Date().toISOString()]], {origin:-1});
// Package and Release Data (`writeFile` tries to write and save an XLSB file)
XLSX.writeFile(workbook, "Report.xlsb");
This library tries to simplify steps 2 and 4 with functions to extract useful data from spreadsheet files (read
/ readFile
) and generate new spreadsheet files from data (write
/ writeFile
). Additional utility functions like table_to_book
work with other common data sources like HTML tables.
This documentation and various demo projects cover a number of common scenarios and approaches for steps 1 and 5.
Utility functions help with step 3.
Data processing should fit in any workflow
The library does not impose a separate lifecycle. It fits nicely in websites and apps built using any framework. The plain JS data objects play nice with Web Workers and future APIs.
"Acquiring and Extracting Data" describes solutions for common data import scenarios.
"Writing Workbooks" describes solutions for common data export scenarios involving actual spreadsheet files.
"Utility Functions" details utility functions for translating JSON Arrays and other common JS structures into worksheet objects.
JavaScript is a powerful language for data processing
The "Common Spreadsheet Format" is a simple object representation of the core concepts of a workbook. The various functions in the library provide low-level tools for working with the object.
For friendly JS processing, there are utility functions for converting parts of a worksheet to/from an Array of Arrays. The following example combines powerful JS Array methods with a network request library to download data, select the information we want and create a workbook file:
Get Data from a JSON Endpoint and Generate a Workbook (click to show)
The goal is to generate a XLSB workbook of US President names and birthdays.
Acquire Data
Raw Data
https://theunitedstates.io/congress-legislators/executive.json has the desired data. For example, John Adams:
{
"id": { /* (data omitted) */ },
"name": {
"first": "John", // <-- first name
"last": "Adams" // <-- last name
},
"bio": {
"birthday": "1735-10-19", // <-- birthday
"gender": "M"
},
"terms": [
{ "type": "viceprez", /* (other fields omitted) */ },
{ "type": "viceprez", /* (other fields omitted) */ },
{ "type": "prez", /* (other fields omitted) */ } // <-- look for "prez"
]
}
Filtering for Presidents
The dataset includes Aaron Burr, a Vice President who was never President!
Array#filter
creates a new array with the desired rows. A President served at least one term with type
set to "prez"
. To test if a particular row has at least one "prez"
term, Array#some
is another native JS function. The complete filter would be:
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
Lining up the data
For this example, the name will be the first name combined with the last name (row.name.first + " " + row.name.last
) and the birthday will be the subfield row.bio.birthday
. Using Array#map
, the dataset can be massaged in one call:
const rows = prez.map(row => ({
name: row.name.first + " " + row.name.last,
birthday: row.bio.birthday
}));
The result is an array of "simple" objects with no nesting:
[
{ name: "George Washington", birthday: "1732-02-22" },
{ name: "John Adams", birthday: "1735-10-19" },
// ... one row per President
]
Extract Data
With the cleaned dataset, XLSX.utils.json_to_sheet
generates a worksheet:
const worksheet = XLSX.utils.json_to_sheet(rows);
XLSX.utils.book_new
creates a new workbook and XLSX.utils.book_append_sheet
appends a worksheet to the workbook. The new worksheet will be called "Dates":
const workbook = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
Process Data
Fixing headers
By default, json_to_sheet
creates a worksheet with a header row. In this case, the headers come from the JS object keys: "name" and "birthday".
The headers are in cells A1 and B1. XLSX.utils.sheet_add_aoa
can write text values to the existing worksheet starting at cell A1:
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
Fixing Column Widths
Some of the names are longer than the default column width. Column widths are set by setting the "!cols"
worksheet property.
The following line sets the width of column A to approximately 10 characters:
worksheet["!cols"] = [ { wch: 10 } ]; // set column A width to 10 characters
One Array#reduce
call over rows
can calculate the maximum width:
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
worksheet["!cols"] = [ { wch: max_width } ];
Note: If the starting point was a file or HTML table, XLSX.utils.sheet_to_json
will generate an array of JS objects.
Package and Release Data
XLSX.writeFile
creates a spreadsheet file and tries to write it to the system. In the browser, it will try to prompt the user to download the file. In NodeJS, it will write to the local directory.
XLSX.writeFile(workbook, "Presidents.xlsx");
Complete Example
// Uncomment the next line for use in NodeJS:
// const XLSX = require("xlsx"), axios = require("axios");
(async() => {
/* fetch JSON data and parse */
const url = "https://theunitedstates.io/congress-legislators/executive.json";
const raw_data = (await axios(url, {responseType: "json"})).data;
/* filter for the Presidents */
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
/* flatten objects */
const rows = prez.map(row => ({
name: row.name.first + " " + row.name.last,
birthday: row.bio.birthday
}));
/* generate worksheet and workbook */
const worksheet = XLSX.utils.json_to_sheet(rows);
const workbook = XLSX.utils.book_new();
XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
/* fix headers */
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
/* calculate column width */
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
worksheet["!cols"] = [ { wch: max_width } ];
/* create an XLSX file and try to save to Presidents.xlsx */
XLSX.writeFile(workbook, "Presidents.xlsx");
})();
For use in the web browser, assuming the snippet is saved to snippet.js
, script tags should be used to include the axios
and xlsx
standalone builds:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
<script src="snippet.js"></script>
File formats are implementation details
The parser covers a wide gamut of common spreadsheet file formats to ensure that "HTML-saved-as-XLS" files work as well as actual XLS or XLSX files.
The writer supports a number of common output formats for broad compatibility with the data ecosystem.
To the greatest extent possible, data processing code should not have to worry about the specific file formats involved.
The demos
directory includes sample projects for:
Frameworks and APIs
angularjs
angular and ionic
knockout
meteor
react and react-native
vue 2.x and weex
XMLHttpRequest and fetch
nodejs server
databases and key/value stores
typed arrays and math
Bundlers and Tooling
Platforms and Integrations
electron application
nw.js application
Chrome / Chromium extensions
Adobe ExtendScript
Headless Browsers
canvas-datagrid
x-spreadsheet
Swift JSC and other engines
"serverless" functions
internet explorer
Other examples are included in the showcase.
Extract data from spreadsheet bytes
var workbook = XLSX.read(data, opts);
The read
method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array
or ArrayBuffer
).
Read spreadsheet bytes from a local file and extract data
var workbook = XLSX.readFile(filename, opts);
The readFile
method attempts to read a spreadsheet file at the supplied path. Browsers generally do not allow reading files in this way (it is deemed a security risk), and attempts to read files in this way will throw an error.
The second opts
argument is optional. "Parsing Options" covers the supported properties and behaviors.
Here are a few common scenarios (click on each subtitle to see the code):
Local file in a NodeJS server (click to show)
readFile
uses fs.readFileSync
under the hood:
var XLSX = require("xlsx");
var workbook = XLSX.readFile("test.xlsx");
For Node ESM, the readFile
helper is not enabled. Instead, fs.readFileSync
should be used to read the file data as a Buffer
for use with XLSX.read
:
import { readFileSync } from "fs";
import { read } from "xlsx/xlsx.mjs";
const buf = readFileSync("test.xlsx");
/* buf is a Buffer */
const workbook = read(buf);
User-submitted file in a web page ("Drag-and-Drop") (click to show)
For modern websites targeting Chrome 76+, File#arrayBuffer
is recommended:
// XLSX is a global from the standalone script
async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);
For maximal compatibility, the FileReader
API should be used:
function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);
https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
User-submitted file with an HTML INPUT element (click to show)
Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
For modern websites targeting Chrome 76+, Blob#arrayBuffer
is recommended:
// XLSX is a global from the standalone script
async function handleFileAsync(e) {
const file = e.target.files[0];
const data = await file.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener("change", handleFileAsync, false);
For broader support (including IE10+), the FileReader
approach is recommended:
function handleFile(e) {
var file = e.target.files[0];
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(file);
}
input_dom_element.addEventListener("change", handleFile, false);
The oldie
demo shows an IE-compatible fallback scenario.
Fetching a file in the web browser ("Ajax") (click to show)
For modern websites targeting Chrome 42+, fetch
is recommended:
// XLSX is a global from the standalone script
(async() => {
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
})();
For broader support, the XMLHttpRequest
approach is recommended:
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
};
req.send();
The xhr
demo includes a longer discussion and more examples.
http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
Local file in a PhotoShop or InDesign plugin (click to show)
readFile
wraps the File
logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
The extendscript
demo includes a more complex example.
Local file in an Electron app (click to show)
readFile
can be used in the renderer process:
/* From the renderer process */
var XLSX = require("xlsx");
var workbook = XLSX.readFile(path);
Electron APIs have changed over time. The electron
demo shows a complete example and details the required version-specific settings.
Local file in a mobile app with React Native (click to show)
The react
demo includes a sample React Native app.
Since React Native does not provide a way to read files from the filesystem, a third-party library must be used. The following libraries have been tested:
The base64
encoding returns strings compatible with the base64
type:
import XLSX from "xlsx";
import { FileSystem } from "react-native-file-access";
const b64 = await FileSystem.readFile(path, "base64");
/* b64 is a base64 string */
const workbook = XLSX.read(b64, {type: "base64"});
The ascii
encoding returns binary strings compatible with the binary
type:
import XLSX from "xlsx";
import { readFile } from "react-native-fs";
const bstr = await readFile(path, "ascii");
/* bstr is a binary string */
const workbook = XLSX.read(bstr, {type: "binary"});
NodeJS Server File Uploads (click to show)
read
can accept a NodeJS buffer. readFile
can read files generated by a HTTP POST request body parser like formidable
:
const XLSX = require("xlsx");
const http = require("http");
const formidable = require("formidable");
const server = http.createServer((req, res) => {
const form = new formidable.IncomingForm();
form.parse(req, (err, fields, files) => {
/* grab the first file */
const f = Object.entries(files)[0][1];
const path = f.filepath;
const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
});
}).listen(process.env.PORT || 7262);
The server
demo has more advanced examples.
Download files in a NodeJS process (click to show)
Node 17.5 and 18.0 have native support for fetch:
const XLSX = require("xlsx");
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
For broader compatibility, third-party modules are recommended.
request
requires a null
encoding to yield Buffers:
var XLSX = require("xlsx");
var request = require("request");
request({url: url, encoding: null}, function(err, resp, body) {
var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
});
axios
works the same way in browser and in NodeJS:
const XLSX = require("xlsx");
const axios = require("axios");
(async() => {
const res = await axios.get(url, {responseType: "arraybuffer"});
/* res.data is a Buffer */
const workbook = XLSX.read(res.data);
/* DO SOMETHING WITH workbook HERE */
})();
Download files in an Electron app (click to show)
The net
module in the main process can make HTTP/HTTPS requests to external resources. Responses should be manually concatenated using Buffer.concat
:
const XLSX = require("xlsx");
const { net } = require("electron");
const req = net.request(url);
req.on("response", (res) => {
const bufs = []; // this array will collect all of the buffers
res.on("data", (chunk) => { bufs.push(chunk); });
res.on("end", () => {
const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
});
});
req.end();
Readable Streams in NodeJS (click to show)
When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end:
var fs = require("fs");
var XLSX = require("xlsx");
function process_RS(stream, cb) {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
ReadableStream in the browser (click to show)
When dealing with ReadableStream
, the easiest approach is to buffer the stream and process the whole thing at the end:
// XLSX is a global from the standalone script
async function process_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}
/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
let off = 0;
for(const u8 of arr) {
out.set(u8, off);
off += u8.length;
}
return out;
}
const data = await process_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);
More detailed examples are covered in the included demos
JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks:
Create a new Worksheet
var workbook = XLSX.utils.book_new();
The book_new
utility function creates an empty workbook with no worksheets.
Append a Worksheet to a Workbook
XLSX.utils.book_append_sheet(workbook, worksheet, sheet_name);
The book_append_sheet
utility function appends a worksheet to the workbook. The third argument specifies the desired worksheet name. Multiple worksheets can be added to a workbook by calling the function multiple times.
Create a worksheet from an array of arrays of JS values
var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
The aoa_to_sheet
utility function walks an "array of arrays" in row-major order, generating a worksheet object. The following snippet generates a sheet with cell A1
set to the string A1
, cell B1
set to B2
, etc:
var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
])
"Array of Arrays Input" describes the function and the optional opts
argument in more detail.
Create a worksheet from an array of JS objects
var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
The json_to_sheet
utility function walks an array of JS objects in order, generating a worksheet object. By default, it will generate a header row and one row per object in the array. The optional opts
argument has settings to control the column order and header output.
"Array of Objects Input" describes the function and the optional opts
argument in more detail.
"Zen of SheetJS" contains a detailed example "Get Data from a JSON Endpoint and Generate a Workbook"
The database
demo includes examples of working with databases and query results.
Create a worksheet by scraping an HTML TABLE in the page
var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
The table_to_sheet
utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. The opts
argument is optional. "HTML Table Input" describes the function in more detail.
Create a workbook by scraping an HTML TABLE in the page
var workbook = XLSX.utils.table_to_book(dom_element, opts);
The table_to_book
utility function follows the same logic as table_to_sheet
. After generating a worksheet, it creates a blank workbook and appends the spreadsheet.
The options argument supports the same options as table_to_sheet
, with the addition of a sheet
property to control the worksheet name. If the property is missing or no options are specified, the default name Sheet1
is used.
Here are a few common scenarios (click on each subtitle to see the code):
HTML TABLE element in a webpage (click to show)
<!-- include the standalone script and shim. this uses the UNPKG CDN -->
<script src="https://unpkg.com/xlsx/dist/shim.min.js"></script>
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>
<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
/* DO SOMETHING WITH workbook HERE */
</script>
Multiple tables on a web page can be converted to individual worksheets:
/* create new workbook */
var workbook = XLSX.utils.book_new();
/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
/* workbook now has 2 worksheets */
Alternatively, the HTML code can be extracted and parsed:
var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)
The chrome
demo shows a complete example and details the required permissions and other settings.
In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:
/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
The full object format is described later in this README.
Reading a specific cell (click to show)
This example extracts the value stored in cell A1 from the first worksheet:
var first_sheet_name = workbook.SheetNames[0];
var address_of_cell = 'A1';
/* Get worksheet */
var worksheet = workbook.Sheets[first_sheet_name];
/* Find desired cell */
var desired_cell = worksheet[address_of_cell];
/* Get the value */
var desired_value = (desired_cell ? desired_cell.v : undefined);
Adding a new worksheet to a workbook (click to show)
This example uses XLSX.utils.aoa_to_sheet
to make a sheet and XLSX.utils.book_append_sheet
to append the sheet to the workbook:
var ws_name = "SheetJS";
/* make worksheet */
var ws_data = [
[ "S", "h", "e", "e", "t", "J", "S" ],
[ 1 , 2 , 3 , 4 , 5 ]
];
var ws = XLSX.utils.aoa_to_sheet(ws_data);
/* Add the worksheet to the workbook */
XLSX.utils.book_append_sheet(wb, ws, ws_name);
Creating a new workbook from scratch (click to show)
The workbook object contains a SheetNames
array of names and a Sheets
object mapping sheet names to sheet objects. The XLSX.utils.book_new
utility function creates a new workbook object:
/* create a new blank workbook */
var wb = XLSX.utils.book_new();
The new workbook is blank and contains no worksheets. The write functions will error if the workbook is empty.
https://sheetjs.com/demos/modify.html read + modify + write files
https://github.com/SheetJS/sheetjs/blob/HEAD/bin/xlsx.njs node
The node version installs a command line tool xlsx
which can read spreadsheet files and output the contents in various formats. The source is available at xlsx.njs
in the bin directory.
Some helper functions in XLSX.utils
generate different views of the sheets:
XLSX.utils.sheet_to_csv
generates CSVXLSX.utils.sheet_to_txt
generates UTF16 Formatted TextXLSX.utils.sheet_to_html
generates HTMLXLSX.utils.sheet_to_json
generates an array of objectsXLSX.utils.sheet_to_formulae
generates a list of formulaeFor writing, the first step is to generate output data. The helper functions write
and writeFile
will produce the data in various formats suitable for dissemination. The second step is to actual share the data with the end point. Assuming workbook
is a workbook object:
nodejs write a file (click to show)
XLSX.writeFile
uses fs.writeFileSync
in server environments:
if(typeof require !== 'undefined') XLSX = require('xlsx');
/* output format determined by filename */
XLSX.writeFile(workbook, 'out.xlsb');
/* at this point, out.xlsb is a file that you can distribute */
Photoshop ExtendScript write a file (click to show)
writeFile
wraps the File
logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* output format determined by filename */
XLSX.writeFile(workbook, 'out.xlsx');
/* at this point, out.xlsx is a file that you can distribute */
The extendscript
demo includes a more complex example.
Browser add TABLE element to page (click to show)
The sheet_to_html
utility function generates HTML code that can be added to any DOM element.
var worksheet = workbook.Sheets[workbook.SheetNames[0]];
var container = document.getElementById('tableau');
container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
Browser upload file (ajax) (click to show)
A complete example using XHR is included in the XHR demo, along with examples for fetch and wrapper libraries. This example assumes the server can handle Base64-encoded files (see the demo for a basic nodejs server):
/* in this example, send a base64 string to the server */
var wopts = { bookType:'xlsx', bookSST:false, type:'base64' };
var wbout = XLSX.write(workbook,wopts);
var req = new XMLHttpRequest();
req.open("POST", "/upload", true);
var formdata = new FormData();
formdata.append('file', 'test.xlsx'); // <-- server expects `file` to hold name
formdata.append('data', wbout); // <-- `data` holds the base64-encoded data
req.send(formdata);
Browser save file (click to show)
XLSX.writeFile
wraps a few techniques for triggering a file save:
URL
browser API creates an object URL for the file, which the library uses by creating a link and forcing a click. It is supported in modern browsers.msSaveBlob
is an IE10+ API for triggering a file save.IE_FileSave
uses VBScript and ActiveX to write a file in IE6+ for Windows XP and Windows 7. The shim must be included in the containing HTML page.There is no standard way to determine if the actual file has been downloaded.
/* output format determined by filename */
XLSX.writeFile(workbook, 'out.xlsb');
/* at this point, out.xlsb will have been downloaded */
Browser save file (compatibility) (click to show)
XLSX.writeFile
techniques work for most modern browsers as well as older IE. For much older browsers, there are workarounds implemented by wrapper libraries.
FileSaver.js
implements saveAs
. Note: XLSX.writeFile
will automatically call saveAs
if available.
/* bookType can be any supported output type */
var wopts = { bookType:'xlsx', bookSST:false, type:'array' };
var wbout = XLSX.write(workbook,wopts);
/* the saveAs call downloads a file on the local machine */
saveAs(new Blob([wbout],{type:"application/octet-stream"}), "test.xlsx");
Downloadify
uses a Flash SWF button to generate local files, suitable for environments where ActiveX is unavailable:
Downloadify.create(id,{
/* other options are required! read the downloadify docs for more info */
filename: "test.xlsx",
data: function() { return XLSX.write(wb, {bookType:"xlsx", type:'base64'}); },
append: false,
dataType: 'base64'
});
The oldie
demo shows an IE-compatible fallback scenario.
The included demos cover mobile apps and other special deployments.
The streaming write functions are available in the XLSX.stream
object. They take the same arguments as the normal write functions but return a Readable Stream. They are only exposed in NodeJS.
XLSX.stream.to_csv
is the streaming version of XLSX.utils.sheet_to_csv
.XLSX.stream.to_html
is the streaming version of XLSX.utils.sheet_to_html
.XLSX.stream.to_json
is the streaming version of XLSX.utils.sheet_to_json
.nodejs convert to CSV and write file (click to show)
var output_file_name = "out.csv";
var stream = XLSX.stream.to_csv(worksheet);
stream.pipe(fs.createWriteStream(output_file_name));
nodejs write JSON stream to screen (click to show)
/* to_json returns an object-mode stream */
var stream = XLSX.stream.to_json(worksheet, {raw:true});
/* the following stream converts JS objects to text via JSON.stringify */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };
stream.pipe(conv); conv.pipe(process.stdout);
https://github.com/sheetjs/sheetaki pipes write streams to nodejs response.
XLSX
is the exposed variable in the browser and the exported node variable
XLSX.version
is the version of the library (added by the build script).
XLSX.SSF
is an embedded version of the format library.
XLSX.read(data, read_opts)
attempts to parse data
.
XLSX.readFile(filename, read_opts)
attempts to read filename
and parse.
Parse options are described in the Parsing Options section.
XLSX.write(wb, write_opts)
attempts to write the workbook wb
XLSX.writeFile(wb, filename, write_opts)
attempts to write wb
to filename
. In browser-based environments, it will attempt to force a client-side download.
XLSX.writeFileAsync(wb, filename, o, cb)
attempts to write wb
to filename
. If o
is omitted, the writer will use the third argument as the callback.
XLSX.stream
contains a set of streaming write functions.
Write options are described in the Writing Options section.
Utilities are available in the XLSX.utils
object and are described in the Utility Functions section:
Constructing:
book_new
creates an empty workbookbook_append_sheet
adds a worksheet to a workbookImporting:
aoa_to_sheet
converts an array of arrays of JS data to a worksheet.json_to_sheet
converts an array of JS objects to a worksheet.table_to_sheet
converts a DOM TABLE element to a worksheet.sheet_add_aoa
adds an array of arrays of JS data to an existing worksheet.sheet_add_json
adds an array of JS objects to an existing worksheet.Exporting:
sheet_to_json
converts a worksheet object to an array of JSON objects.sheet_to_csv
generates delimiter-separated-values output.sheet_to_txt
generates UTF16 formatted text.sheet_to_html
generates HTML output.sheet_to_formulae
generates a list of the formulae (with value fallbacks).Cell and cell address manipulation:
format_cell
generates the text value for a cell (using number formats).encode_row / decode_row
converts between 0-indexed rows and 1-indexed rows.encode_col / decode_col
converts between 0-indexed columns and column names.encode_cell / decode_cell
converts cell addresses.encode_range / decode_range
converts cell ranges.SheetJS conforms to the Common Spreadsheet Format (CSF):
Cell address objects are stored as {c:C, r:R}
where C
and R
are 0-indexed column and row numbers, respectively. For example, the cell address B5
is represented by the object {c:1, r:4}
.
Cell range objects are stored as {s:S, e:E}
where S
is the first cell and E
is the last cell in the range. The ranges are inclusive. For example, the range A3:B7
is represented by the object {s:{c:0, r:2}, e:{c:1, r:6}}
. Utility functions perform a row-major order walk traversal of a sheet range:
for(var R = range.s.r; R <= range.e.r; ++R) {
for(var C = range.s.c; C <= range.e.c; ++C) {
var cell_address = {c:C, r:R};
/* if an A1-style address is needed, encode the address */
var cell_ref = XLSX.utils.encode_cell(cell_address);
}
}
Cell objects are plain JS objects with keys and values following the convention:
Key | Description |
---|---|
v | raw value (see Data Types section for more info) |
w | formatted text (if applicable) |
t | type: b Boolean, e Error, n Number, d Date, s Text, z Stub |
f | cell formula encoded as an A1-style string (if applicable) |
F | range of enclosing array if formula is array formula (if applicable) |
r | rich text encoding (if applicable) |
h | HTML rendering of the rich text (if applicable) |
c | comments associated with the cell |
z | number format string associated with the cell (if requested) |
l | cell hyperlink object (.Target holds link, .Tooltip is tooltip) |
s | the style/theme of the cell (if applicable) |
Built-in export utilities (such as the CSV exporter) will use the w
text if it is available. To change a value, be sure to delete cell.w
(or set it to undefined
) before attempting to export. The utilities will regenerate the w
text from the number format (cell.z
) and the raw value if possible.
The actual array formula is stored in the f
field of the first cell in the array range. Other cells in the range will omit the f
field.
The raw value is stored in the v
value property, interpreted based on the t
type property. This separation allows for representation of numbers as well as numeric text. There are 6 valid cell types:
Type | Description |
---|---|
b | Boolean: value interpreted as JS boolean |
e | Error: value is a numeric code and w property stores common name ** |
n | Number: value is a JS number ** |
d | Date: value is a JS Date object or string to be parsed as Date ** |
s | Text: value interpreted as JS string and written as text ** |
z | Stub: blank stub cell that is ignored by data processing utilities ** |
Error values and interpretation (click to show)
Value | Error Meaning |
---|---|
0x00 | #NULL! |
0x07 | #DIV/0! |
0x0F | #VALUE! |
0x17 | #REF! |
0x1D | #NAME? |
0x24 | #NUM! |
0x2A | #N/A |
0x2B | #GETTING_DATA |
Type n
is the Number type. This includes all forms of data that Excel stores as numbers, such as dates/times and Boolean fields. Excel exclusively uses data that can be fit in an IEEE754 floating point number, just like JS Number, so the v
field holds the raw number. The w
field holds formatted text. Dates are stored as numbers by default and converted with XLSX.SSF.parse_date_code
.
Type d
is the Date type, generated only when the option cellDates
is passed. Since JSON does not have a natural Date type, parsers are generally expected to store ISO 8601 Date strings like you would get from date.toISOString()
. On the other hand, writers and exporters should be able to handle date strings and JS Date objects. Note that Excel disregards timezone modifiers and treats all dates in the local timezone. The library does not correct for this error.
Type s
is the String type. Values are explicitly stored as text. Excel will interpret these cells as "number stored as text". Generated Excel files automatically suppress that class of error, but other formats may elicit errors.
Type z
represents blank stub cells. They are generated in cases where cells have no assigned value but hold comments or other metadata. They are ignored by the core library data processing utility functions. By default these cells are not generated; the parser sheetStubs
option must be set to true
.
Excel Date Code details (click to show)
By default, Excel stores dates as numbers with a format code that specifies date processing. For example, the date 19-Feb-17
is stored as the number 42785
with a number format of d-mmm-yy
. The SSF
module understands number formats and performs the appropriate conversion.
XLSX also supports a special date type d
where the data is an ISO 8601 date string. The formatter converts the date back to a number.
The default behavior for all parsers is to generate number cells. Setting cellDates
to true will force the generators to store dates.
Time Zones and Dates (click to show)
Excel has no native concept of universal time. All times are specified in the local time zone. Excel limitations prevent specifying true absolute dates.
Following Excel, this library treats all dates as relative to local time zone.
Epochs: 1900 and 1904 (click to show)
Excel supports two epochs (January 1 1900 and January 1 1904). The workbook's epoch can be determined by examining the workbook's wb.Workbook.WBProps.date1904
property:
!!(((wb.Workbook||{}).WBProps||{}).date1904)
Each key that does not start with !
maps to a cell (using A-1
notation)
sheet[address]
returns the cell object for the specified address.
Special sheet keys (accessible as sheet[key]
, each starting with !
):
sheet['!ref']
: A-1 based range representing the sheet range. Functions that work with sheets should use this parameter to determine the range. Cells that are assigned outside of the range are not processed. In particular, when writing a sheet by hand, cells outside of the range are not included
Functions that handle sheets should test for the presence of !ref
field. If the !ref
is omitted or is not a valid range, functions are free to treat the sheet as empty or attempt to guess the range. The standard utilities that ship with this library treat sheets as empty (for example, the CSV output is empty string).
When reading a worksheet with the sheetRows
property set, the ref parameter will use the restricted range. The original range is set at ws['!fullref']
sheet['!margins']
: Object representing the page margins. The default values follow Excel's "normal" preset. Excel also has a "wide" and a "narrow" preset but they are stored as raw measurements. The main properties are listed below:
Page margin details (click to show)
key | description | "normal" | "wide" | "narrow" |
---|---|---|---|---|
left | left margin (inches) | 0.7 | 1.0 | 0.25 |
right | right margin (inches) | 0.7 | 1.0 | 0.25 |
top | top margin (inches) | 0.75 | 1.0 | 0.75 |
bottom | bottom margin (inches) | 0.75 | 1.0 | 0.75 |
header | header margin (inches) | 0.3 | 0.5 | 0.3 |
footer | footer margin (inches) | 0.3 | 0.5 | 0.3 |
/* Set worksheet sheet to "normal" */
ws["!margins"]={left:0.7, right:0.7, top:0.75,bottom:0.75,header:0.3,footer:0.3}
/* Set worksheet sheet to "wide" */
ws["!margins"]={left:1.0, right:1.0, top:1.0, bottom:1.0, header:0.5,footer:0.5}
/* Set worksheet sheet to "narrow" */
ws["!margins"]={left:0.25,right:0.25,top:0.75,bottom:0.75,header:0.3,footer:0.3}
In addition to the base sheet keys, worksheets also add:
ws['!cols']
: array of column properties objects. Column widths are actually stored in files in a normalized manner, measured in terms of the "Maximum Digit Width" (the largest width of the rendered digits 0-9, in pixels). When parsed, the column objects store the pixel width in the wpx
field, character width in the wch
field, and the maximum digit width in the MDW
field.
ws['!rows']
: array of row properties objects as explained later in the docs. Each row object encodes properties including row height and visibility.
ws['!merges']
: array of range objects corresponding to the merged cells in the worksheet. Plain text formats do not support merge cells. CSV export will write all cells in the merge range if they exist, so be sure that only the first cell (upper-left) in the range is set.
ws['!outline']
: configure how outlines should behave. Options default to the default settings in Excel 2019:
key | Excel feature | default |
---|---|---|
above | Uncheck "Summary rows below detail" | false |
left | Uncheck "Summary rows to the right of detail" | false |
ws['!protect']
: object of write sheet protection properties. The password
key specifies the password for formats that support password-protected sheets (XLSX/XLSB/XLS). The writer uses the XOR obfuscation method. The following keys control the sheet protection -- set to false
to enable a feature when sheet is locked or set to true
to disable a feature:Worksheet Protection Details (click to show)
key | feature (true=disabled / false=enabled) | default |
---|---|---|
selectLockedCells | Select locked cells | enabled |
selectUnlockedCells | Select unlocked cells | enabled |
formatCells | Format cells | disabled |
formatColumns | Format columns | disabled |
formatRows | Format rows | disabled |
insertColumns | Insert columns | disabled |
insertRows | Insert rows | disabled |
insertHyperlinks | Insert hyperlinks | disabled |
deleteColumns | Delete columns | disabled |
deleteRows | Delete rows | disabled |
sort | Sort | disabled |
autoFilter | Filter | disabled |
pivotTables | Use PivotTable reports | disabled |
objects | Edit objects | enabled |
scenarios | Edit scenarios | enabled |
ws['!autofilter']
: AutoFilter object following the schema:type AutoFilter = {
ref:string; // A-1 based range representing the AutoFilter table range
}
Chartsheets are represented as standard sheets. They are distinguished with the !type
property set to "chart"
.
The underlying data and !ref
refer to the cached data in the chartsheet. The first row of the chartsheet is the underlying header.
Macrosheets are represented as standard sheets. They are distinguished with the !type
property set to "macro"
.
Dialogsheets are represented as standard sheets. They are distinguished with the !type
property set to "dialog"
.
workbook.SheetNames
is an ordered list of the sheets in the workbook
wb.Sheets[sheetname]
returns an object representing the worksheet.
wb.Props
is an object storing the standard properties. wb.Custprops
stores custom properties. Since the XLS standard properties deviate from the XLSX standard, XLS parsing stores core properties in both places.
wb.Workbook
stores workbook-level attributes.
The various file formats use different internal names for file properties. The workbook Props
object normalizes the names:
File Properties (click to show)
JS Name | Excel Description |
---|---|
Title | Summary tab "Title" |
Subject | Summary tab "Subject" |
Author | Summary tab "Author" |
Manager | Summary tab "Manager" |
Company | Summary tab "Company" |
Category | Summary tab "Category" |
Keywords | Summary tab "Keywords" |
Comments | Summary tab "Comments" |
LastAuthor | Statistics tab "Last saved by" |
CreatedDate | Statistics tab "Created" |
For example, to set the workbook title property:
if(!wb.Props) wb.Props = {};
wb.Props.Title = "Insert Title Here";
Custom properties are added in the workbook Custprops
object:
if(!wb.Custprops) wb.Custprops = {};
wb.Custprops["Custom Property"] = "Custom Value";
Writers will process the Props
key of the options object:
/* force the Author to be "SheetJS" */
XLSX.write(wb, {Props:{Author:"SheetJS"}});
wb.Workbook
stores workbook-level attributes.
wb.Workbook.Names
is an array of defined name objects which have the keys:
Defined Name Properties (click to show)
Key | Description |
---|---|
Sheet | Name scope. Sheet Index (0 = first sheet) or null (Workbook) |
Name | Case-sensitive name. Standard rules apply ** |
Ref | A1-style Reference ("Sheet1!$A$1:$D$20" ) |
Comment | Comment (only applicable for XLS/XLSX/XLSB) |
Excel allows two sheet-scoped defined names to share the same name. However, a sheet-scoped name cannot collide with a workbook-scope name. Workbook writers may not enforce this constraint.
wb.Workbook.Views
is an array of workbook view objects which have the keys:
Key | Description |
---|---|
RTL | If true, display right-to-left |
wb.Workbook.WBProps
holds other workbook properties:
Key | Description |
---|---|
CodeName | VBA Project Workbook Code Name |
date1904 | epoch: 0/false for 1900 system, 1/true for 1904 |
filterPrivacy | Warn or strip personally identifying info on save |
Even for basic features like date storage, the official Excel formats store the same content in different ways. The parsers are expected to convert from the underlying file format representation to the Common Spreadsheet Format. Writers are expected to convert from CSF back to the underlying file format.
The A1-style formula string is stored in the f
field. Even though different file formats store the formulae in different ways, the formats are translated. Even though some formats store formulae with a leading equal sign, CSF formulae do not start with =
.
Representation of A1=1, A2=2, A3=A1+A2 (click to show)
{
"!ref": "A1:A3",
A1: { t:'n', v:1 },
A2: { t:'n', v:2 },
A3: { t:'n', v:3, f:'A1+A2' }
}
Shared formulae are decompressed and each cell has the formula corresponding to its cell. Writers generally do not attempt to generate shared formulae.
Cells with formula entries but no value will be serialized in a way that Excel and other spreadsheet tools will recognize. This library will not automatically compute formula results! For example, to compute BESSELJ
in a worksheet:
Formula without known value (click to show)
{
"!ref": "A1:A3",
A1: { t:'n', v:3.14159 },
A2: { t:'n', v:2 },
A3: { t:'n', f:'BESSELJ(A1,A2)' }
}
Array Formulae
Array formulae are stored in the top-left cell of the array block. All cells of an array formula have a F
field corresponding to the range. A single-cell formula can be distinguished from a plain formula by the presence of F
field.
Array Formula examples (click to show)
For example, setting the cell C1
to the array formula {=SUM(A1:A3*B1:B3)}
:
worksheet['C1'] = { t:'n', f: "SUM(A1:A3*B1:B3)", F:"C1:C1" };
For a multi-cell array formula, every cell has the same array range but only the first cell specifies the formula. Consider D1:D3=A1:A3*B1:B3
:
worksheet['D1'] = { t:'n', F:"D1:D3", f:"A1:A3*B1:B3" };
worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
Utilities and writers are expected to check for the presence of a F
field and ignore any possible formula element f
in cells other than the starting cell. They are not expected to perform validation of the formulae!
Formula Output Utility Function (click to show)
The sheet_to_formulae
method generates one line per formula or array formula. Array formulae are rendered in the form range=formula
while plain cells are rendered in the form cell=formula or value
. Note that string literals are prefixed with an apostrophe '
, consistent with Excel's formula bar display.
Formulae File Format Details (click to show)
Storage Representation | Formats | Read | Write |
---|---|---|---|
A1-style strings | XLSX | ✔ | ✔ |
RC-style strings | XLML and plain text | ✔ | ✔ |
BIFF Parsed formulae | XLSB and all XLS formats | ✔ | |
OpenFormula formulae | ODS/FODS/UOS | ✔ | ✔ |
Lotus Parsed formulae | All Lotus WK_ formats | ✔ |
Since Excel prohibits named cells from colliding with names of A1 or RC style cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed formulae and Lotus Parsed formulae have to be explicitly unwound. OpenFormula formulae can be converted with regular expressions.
Format Support (click to show)
Row Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS
Column Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM
Row and Column properties are not extracted by default when reading from a file and are not persisted by default when writing to a file. The option cellStyles: true
must be passed to the relevant read or write function.
Column Properties
The !cols
array in each worksheet, if present, is a collection of ColInfo
objects which have the following properties:
type ColInfo = {
/* visibility */
hidden?: boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?: number; // width in screen pixels
width?: number; // width in Excel's "Max Digit Width", width*256 is integral
wch?: number; // width in characters
/* other fields for preserving features from files */
level?: number; // 0-indexed outline / group level
MDW?: number; // Excel's "Max Digit Width" unit, always integral
};
Row Properties
The !rows
array in each worksheet, if present, is a collection of RowInfo
objects which have the following properties:
type RowInfo = {
/* visibility */
hidden?: boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?: number; // height in screen pixels
hpt?: number; // height in points
level?: number; // 0-indexed outline / group level
};
Outline / Group Levels Convention
The Excel UI displays the base outline level as 1
and the max level as 8
. Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base outline level is 0
and the max level is 7
.
Why are there three width types? (click to show)
There are three different width types corresponding to the three different ways spreadsheets store column widths:
SYLK and other plain text formats use raw character count. Contemporaneous tools like Visicalc and Multiplan were character based. Since the characters had the same width, it sufficed to store a count. This tradition was continued into the BIFF formats.
SpreadsheetML (2003) tried to align with HTML by standardizing on screen pixel count throughout the file. Column widths, row heights, and other measures use pixels. When the pixel and character counts do not align, Excel rounds values.
XLSX internally stores column widths in a nebulous "Max Digit Width" form. The Max Digit Width is the width of the largest digit when rendered (generally the "0" character is the widest). The internal width must be an integer multiple of the the width divided by 256. ECMA-376 describes a formula for converting between pixels and the internal width. This represents a hybrid approach.
Read functions attempt to populate all three properties. Write functions will try to cycle specified values to the desired type. In order to avoid potential conflicts, manipulation should delete the other properties first. For example, when changing the pixel width, delete the wch
and width
properties.
Implementation details (click to show)
Row Heights
Excel internally stores row heights in points. The default resolution is 72 DPI or 96 PPI, so the pixel and point size should agree. For different resolutions they may not agree, so the library separates the concepts.
Even though all of the information is made available, writers are expected to follow the priority order:
hpx
pixel height if availablehpt
point height if availableColumn Widths
Given the constraints, it is possible to determine the MDW without actually inspecting the font! The parsers guess the pixel width by converting from width to pixels and back, repeating for all possible MDW and selecting the MDW that minimizes the error. XLML actually stores the pixel width, so the guess works in the opposite direction.
Even though all of the information is made available, writers are expected to follow the priority order:
width
field if availablewpx
pixel width if availablewch
character count if availableThe cell.w
formatted text for each cell is produced from cell.v
and cell.z
format. If the format is not specified, the Excel General
format is used. The format can either be specified as a string or as an index into the format table. Parsers are expected to populate workbook.SSF
with the number format table. Writers are expected to serialize the table.
Custom tools should ensure that the local table has each used format string somewhere in the table. Excel convention mandates that the custom formats start at index 164. The following example creates a custom format from scratch:
New worksheet with custom format (click to show)
var wb = {
SheetNames: ["Sheet1"],
Sheets: {
Sheet1: {
"!ref":"A1:C1",
A1: { t:"n", v:10000 }, // <-- General format
B1: { t:"n", v:10000, z: "0%" }, // <-- Builtin format
C1: { t:"n", v:10000, z: "\"T\"\ #0.00" } // <-- Custom format
}
}
}
The rules are slightly different from how Excel displays custom number formats. In particular, literal characters must be wrapped in double quotes or preceded by a backslash. For more info, see the Excel documentation article Create or delete a custom number format
or ECMA-376 18.8.31 (Number Formats)
Default Number Formats (click to show)
The default formats are listed in ECMA-376 18.8.30:
ID | Format |
---|---|
0 | General |
1 | 0 |
2 | 0.00 |
3 | #,##0 |
4 | #,##0.00 |
9 | 0% |
10 | 0.00% |
11 | 0.00E+00 |
12 | # ?/? |
13 | # ??/?? |
14 | m/d/yy (see below) |
15 | d-mmm-yy |
16 | d-mmm |
17 | mmm-yy |
18 | h:mm AM/PM |
19 | h:mm:ss AM/PM |
20 | h:mm |
21 | h:mm:ss |
22 | m/d/yy h:mm |
37 | #,##0 ;(#,##0) |
38 | #,##0 ;[Red](#,##0) |
39 | #,##0.00;(#,##0.00) |
40 | #,##0.00;[Red](#,##0.00) |
45 | mm:ss |
46 | [h]:mm:ss |
47 | mmss.0 |
48 | ##0.0E+0 |
49 | @ |
Format 14 (m/d/yy
) is localized by Excel: even though the file specifies that number format, it will be drawn differently based on system settings. It makes sense when the producer and consumer of files are in the same locale, but that is not always the case over the Internet. To get around this ambiguity, parse functions accept the dateNF
option to override the interpretation of that specific format string.
Format Support (click to show)
Cell Hyperlinks: XLSX/M, XLSB, BIFF8 XLS, XLML, ODS
Tooltips: XLSX/M, XLSB, BIFF8 XLS, XLML
Hyperlinks are stored in the l
key of cell objects. The Target
field of the hyperlink object is the target of the link, including the URI fragment. Tooltips are stored in the Tooltip
field and are displayed when you move your mouse over the text.
For example, the following snippet creates a link from cell A3
to https://sheetjs.com with the tip "Find us @ SheetJS.com!"
:
ws['A1'].l = { Target:"https://sheetjs.com", Tooltip:"Find us @ SheetJS.com!" };
Note that Excel does not automatically style hyperlinks -- they will generally be displayed as normal text.
Remote Links
HTTP / HTTPS links can be used directly:
ws['A2'].l = { Target:"https://docs.sheetjs.com/#hyperlinks" };
ws['A3'].l = { Target:"http://localhost:7262/yes_localhost_works" };
Excel also supports mailto
email links with subject line:
ws['A4'].l = { Target:"mailto:ignored@dev.null" };
ws['A5'].l = { Target:"mailto:ignored@dev.null?subject=Test Subject" };
Local Links
Links to absolute paths should use the file://
URI scheme:
ws['B1'].l = { Target:"file:///SheetJS/t.xlsx" }; /* Link to /SheetJS/t.xlsx */
ws['B2'].l = { Target:"file:///c:/SheetJS.xlsx" }; /* Link to c:\SheetJS.xlsx */
Links to relative paths can be specified without a scheme:
ws['B3'].l = { Target:"SheetJS.xlsb" }; /* Link to SheetJS.xlsb */
ws['B4'].l = { Target:"../SheetJS.xlsm" }; /* Link to ../SheetJS.xlsm */
Relative Paths have undefined behavior in the SpreadsheetML 2003 format. Excel 2019 will treat a ..\
parent mark as two levels up.
Internal Links
Links where the target is a cell or range or defined name in the same workbook ("Internal Links") are marked with a leading hash character:
ws['C1'].l = { Target:"#E2" }; /* Link to cell E2 */
ws['C2'].l = { Target:"#Sheet2!E2" }; /* Link to cell E2 in sheet Sheet2 */
ws['C3'].l = { Target:"#SomeDefinedName" }; /* Link to Defined Name */
Cell comments are objects stored in the c
array of cell objects. The actual contents of the comment are split into blocks based on the comment author. The a
field of each comment object is the author of the comment and the t
field is the plain text representation.
For example, the following snippet appends a cell comment into cell A1
:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"I'm a little comment, short and stout!"});
Note: XLSB enforces a 54 character limit on the Author name. Names longer than 54 characters may cause issues with other formats.
To mark a comment as normally hidden, set the hidden
property:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This comment is visible"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This comment will be hidden"});
Excel enables hiding sheets in the lower tab bar. The sheet data is stored in the file but the UI does not readily make it available. Standard hidden sheets are revealed in the "Unhide" menu. Excel also has "very hidden" sheets which cannot be revealed in the menu. It is only accessible in the VB Editor!
The visibility setting is stored in the Hidden
property of sheet props array.
More details (click to show)
Value | Definition |
---|---|
0 | Visible |
1 | Hidden |
2 | Very Hidden |
With https://rawgit.com/SheetJS/test_files/HEAD/sheet_visibility.xlsx:
> wb.Workbook.Sheets.map(function(x) { return [x.name, x.Hidden] })
[ [ 'Visible', 0 ], [ 'Hidden', 1 ], [ 'VeryHidden', 2 ] ]
Non-Excel formats do not support the Very Hidden state. The best way to test if a sheet is visible is to check if the Hidden
property is logical truth:
> wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
[ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
VBA Macros are stored in a special data blob that is exposed in the vbaraw
property of the workbook object when the bookVBA
option is true
. They are supported in XLSM
, XLSB
, and BIFF8 XLS
formats. The supported format writers automatically insert the data blobs if it is present in the workbook and associate with the worksheet names.
Custom Code Names (click to show)
The workbook code name is stored in wb.Workbook.WBProps.CodeName
. By default, Excel will write ThisWorkbook
or a translated phrase like DieseArbeitsmappe
. Worksheet and Chartsheet code names are in the worksheet properties object at wb.Workbook.Sheets[i].CodeName
. Macrosheets and Dialogsheets are ignored.
The readers and writers preserve the code names, but they have to be manually set when adding a VBA blob to a different workbook.
Macrosheets (click to show)
Older versions of Excel also supported a non-VBA "macrosheet" sheet type that stored automation commands. These are exposed in objects with the !type
property set to "macro"
.
Detecting macros in workbooks (click to show)
The vbaraw
field will only be set if macros are present, so testing is simple:
function wb_has_macro(wb/*:workbook*/)/*:boolean*/ {
if(!!wb.vbaraw) return true;
const sheets = wb.SheetNames.map((n) => wb.Sheets[n]);
return sheets.some((ws) => !!ws && ws['!type']=='macro');
}
The exported read
and readFile
functions accept an options argument:
Option Name | Default | Description |
---|---|---|
type | Input data encoding (see Input Type below) | |
raw | false | If true, plain text parsing will not parse values ** |
codepage | If specified, use code page when appropriate ** | |
cellFormula | true | Save formulae to the .f field |
cellHTML | true | Parse rich text and save HTML to the .h field |
cellNF | false | Save number format string to the .z field |
cellStyles | false | Save style/theme info to the .s field |
cellText | true | Generated formatted text to the .w field |
cellDates | false | Store dates as type d (default is n ) |
dateNF | If specified, use the string for date code 14 ** | |
sheetStubs | false | Create cell objects of type z for stub cells |
sheetRows | 0 | If >0, read the first sheetRows rows ** |
bookDeps | false | If true, parse calculation chains |
bookFiles | false | If true, add raw files to book object ** |
bookProps | false | If true, only parse enough to get book metadata ** |
bookSheets | false | If true, only parse enough to get the sheet names |
bookVBA | false | If true, copy VBA blob to vbaraw field ** |
password | "" | If defined and file is encrypted, use password ** |
WTF | false | If true, throw errors on unexpected file features ** |
sheets | If specified, only parse specified sheets ** | |
PRN | false | If true, allow parsing of PRN files ** |
xlfn | false | If true, preserve _xlfn. prefixes in formulae ** |
FS | DSV Field Separator override |
cellNF
is false, formatted text will be generated and saved to .w
bookSheets
is false.raw
option suppresses value parsing.bookSheets
and bookProps
combine to give both sets of informationDeps
will be an empty object if bookDeps
is falsebookFiles
behavior depends on file type:keys
array (paths in the ZIP) for ZIP-based formatsfiles
hash (mapping paths to objects representing the files) for ZIPcfb
object for formats using CFB containerssheetRows-1
rows will be generated when looking at the JSON object output (since the header row is counted as a row when parsing the data)sheets
restricts based on input type:0
is first worksheet)bookVBA
merely exposes the raw VBA CFB object. It does not parse the data. XLSM and XLSB store the VBA CFB object in xl/vbaProject.bin
. BIFF8 XLS mixes the VBA entries alongside the core Workbook entry, so the library generates a new XLSB-compatible blob from the XLS CFB container.codepage
is applied to BIFF2 - BIFF5 files without CodePage
records and to CSV files without BOM in type:"binary"
. BIFF8 XLS always defaults to 1200.PRN
affects parsing of text files without a common delimiter character._xlfn.
prefix, hidden from the user. SheetJS will strip _xlfn.
normally. The xlfn
option preserves them.WTF:true
forces those errors to be thrown.Strings can be interpreted in multiple ways. The type
parameter for read
tells the library how to parse the data argument:
type | expected input |
---|---|
"base64" | string: Base64 encoding of the file |
"binary" | string: binary string (byte n is data.charCodeAt(n) ) |
"string" | string: JS string (characters interpreted as UTF8) |
"buffer" | nodejs Buffer |
"array" | array: array of 8-bit unsigned int (byte n is data[n] ) |
"file" | string: path of file that will be read (nodejs only) |
Implementation Details (click to show)
Excel and other spreadsheet tools read the first few bytes and apply other heuristics to determine a file type. This enables file type punning: renaming files with the .xls
extension will tell your computer to use Excel to open the file but Excel will know how to handle it. This library applies similar logic:
Byte 0 | Raw File Type | Spreadsheet Types |
---|---|---|
0xD0 | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR |
0x09 | BIFF Stream | BIFF 2/3/4/5 |
0x3C | XML/HTML | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x50 | ZIP Archive | XLSB or XLSX/M or ODS or UOS2 or NUMBERS or text |
0x49 | Plain Text | SYLK or plain text |
0x54 | Plain Text | DIF or plain text |
0xEF | UTF8 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0xFF | UTF16 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x00 | Record Stream | Lotus WK* or Quattro Pro or plain text |
0x7B | Plain text | RTF or plain text |
0x0A | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x0D | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x20 | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
DBF files are detected based on the first byte as well as the third and fourth bytes (corresponding to month and day of the file date)
Works for Windows files are detected based on the BOF record with type 0xFF
Plain text format guessing follows the priority order:
Format | Test |
---|---|
XML | <?xml appears in the first 1024 characters |
HTML | starts with < and HTML tags appear in the first 1024 characters * |
XML | starts with < and the first tag is valid |
RTF | starts with {\rt |
DSV | starts with /sep=.$/ , separator is the specified character |
DSV | more unquoted ` |
DSV | more unquoted ; chars than \t or , in the first 1024 |
TSV | more unquoted \t chars than , chars in the first 1024 |
CSV | one of the first 1024 characters is a comma "," |
ETH | starts with socialcalc:version: |
PRN | PRN option is set to true |
CSV | (fallback) |
html
, table
, head
, meta
, script
, style
, div
Why are random text files valid? (click to show)
Excel is extremely aggressive in reading files. Adding an XLS extension to any display text file (where the only characters are ANSI display chars) tricks Excel into thinking that the file is potentially a CSV or TSV file, even if it is only one column! This library attempts to replicate that behavior.
The best approach is to validate the desired worksheet and ensure it has the expected number of rows or columns. Extracting the range is extremely simple:
var range = XLSX.utils.decode_range(worksheet['!ref']);
var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1;
The exported write
and writeFile
functions accept an options argument:
Option Name | Default | Description |
---|---|---|
type | Output data encoding (see Output Type below) | |
cellDates | false | Store dates as type d (default is n ) |
bookSST | false | Generate Shared String Table ** |
bookType | "xlsx" | Type of Workbook (see below for supported formats) |
sheet | "" | Name of Worksheet for single-sheet formats ** |
compression | false | Use ZIP compression for ZIP-based formats ** |
Props | Override workbook properties when writing ** | |
themeXLSX | Override theme XML when writing XLSX/XLSB/XLSM ** | |
ignoreEC | true | Suppress "number as text" errors ** |
bookSST
is slower and more memory intensive, but has better compatibility with older versions of iOS NumberscellDates
only applies to XLSX output and is not guaranteed to work with third-party readers. Excel itself does not usually write cells with type d
so non-Excel tools may ignore the data or error in the presence of dates.Props
is an object mirroring the workbook Props
field. See the table from the Workbook File Properties section.themeXLSX
will be saved as the primary theme for XLSX/XLSB/XLSM files (to xl/theme/theme1.xml
in the ZIP)ignoreEC
to false
to suppress.For broad compatibility with third-party tools, this library supports many output formats. The specific file type is controlled with bookType
option:
bookType | file ext | container | sheets | Description |
---|---|---|---|---|
xlsx | .xlsx | ZIP | multi | Excel 2007+ XML Format |
xlsm | .xlsm | ZIP | multi | Excel 2007+ Macro XML Format |
xlsb | .xlsb | ZIP | multi | Excel 2007+ Binary Format |
biff8 | .xls | CFB | multi | Excel 97-2004 Workbook Format |
biff5 | .xls | CFB | multi | Excel 5.0/95 Workbook Format |
biff4 | .xls | none | single | Excel 4.0 Worksheet Format |
biff3 | .xls | none | single | Excel 3.0 Worksheet Format |
biff2 | .xls | none | single | Excel 2.0 Worksheet Format |
xlml | .xls | none | multi | Excel 2003-2004 (SpreadsheetML) |
ods | .ods | ZIP | multi | OpenDocument Spreadsheet |
fods | .fods | none | multi | Flat OpenDocument Spreadsheet |
wk3 | .wk3 | none | single | Lotus Workbook (WK3) |
csv | .csv | none | single | Comma Separated Values |
txt | .txt | none | single | UTF-16 Unicode Text (TXT) |
sylk | .sylk | none | single | Symbolic Link (SYLK) |
html | .html | none | single | HTML Document |
dif | .dif | none | single | Data Interchange Format (DIF) |
dbf | .dbf | none | single | dBASE II + VFP Extensions (DBF) |
wk1 | .wk1 | none | single | Lotus Worksheet (WK1) |
rtf | .rtf | none | single | Rich Text Format (RTF) |
prn | .prn | none | single | Lotus Formatted Text |
eth | .eth | none | single | Ethercalc Record Format (ETH) |
compression
only applies to formats with ZIP containers.sheet
option specifying the worksheet. If the string is empty, the first worksheet is used.writeFile
will automatically guess the output file format based on the file extension if bookType
is not specified. It will choose the first format in the aforementioned table that matches the extension.The type
argument for write
mirrors the type
argument for read
:
type | output |
---|---|
"base64" | string: Base64 encoding of the file |
"binary" | string: binary string (byte n is data.charCodeAt(n) ) |
"string" | string: JS string (characters interpreted as UTF8) |
"buffer" | nodejs Buffer |
"array" | ArrayBuffer, fallback array of 8-bit unsigned int |
"file" | string: path of file that will be created (nodejs only) |
The sheet_to_*
functions accept a worksheet and an optional options object.
The *_to_sheet
functions accept a data object and an optional options object.
The examples are based on the following worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
3 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
XLSX.utils.aoa_to_sheet
takes an array of arrays of JS values and returns a worksheet resembling the input data. Numbers, Booleans and Strings are stored as the corresponding styles. Dates are stored as date or numbers. Array holes and explicit undefined
values are skipped. null
values may be stubbed. All other values are stored as strings. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetStubs | false | Create cell objects of type z for null values |
nullError | false | If true, emit #NULL! error cells for null values |
Examples (click to show)
To generate the example sheet:
var ws = XLSX.utils.aoa_to_sheet([
"SheetJS".split(""),
[1,2,3,4,5,6,7],
[2,3,4,5,6,7,8]
]);
XLSX.utils.sheet_add_aoa
takes an array of arrays of JS values and updates an existing worksheet object. It follows the same process as aoa_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetStubs | false | Create cell objects of type z for null values |
nullError | false | If true, emit #NULL! error cells for null values |
origin | Use specified cell as starting point (see below) |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
Consider the worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | | | 5 | 6 | 7 |
3 | 2 | 3 | | | 6 | 7 | 8 |
4 | 3 | 4 | | | 7 | 8 | 9 |
5 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
This worksheet can be built up in the order A1:G1, A2:B4, E2:G4, A5:G5
:
/* Initial row */
var ws = XLSX.utils.aoa_to_sheet([ "SheetJS".split("") ]);
/* Write data starting at A2 */
XLSX.utils.sheet_add_aoa(ws, [[1,2], [2,3], [3,4]], {origin: "A2"});
/* Write data starting at E2 */
XLSX.utils.sheet_add_aoa(ws, [[5,6,7], [6,7,8], [7,8,9]], {origin:{r:1, c:4}});
/* Append row */
XLSX.utils.sheet_add_aoa(ws, [[4,5,6,7,8,9,0]], {origin: -1});
XLSX.utils.json_to_sheet
takes an array of objects and returns a worksheet with automatically-generated "headers" based on the keys of the objects. The default column order is determined by the first appearance of the field using Object.keys
. The function accepts an options argument:
Option Name | Default | Description |
---|---|---|
header | Use specified field order (default Object.keys ) ** | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
skipHeader | false | If true, do not include header row in output |
nullError | false | If true, emit #NULL! error cells for null values |
header
is an array and it does not contain a particular field, the key will be appended to the array.Date
object will generate a Date cell, while a string will generate a Text cell.nullError
is true, an error cell corresponding to #NULL!
will be written to the worksheet.Examples (click to show)
The original sheet cannot be reproduced using plain objects since JS object keys must be unique. After replacing the second e
and S
with e_1
and S_1
:
var ws = XLSX.utils.json_to_sheet([
{ S:1, h:2, e:3, e_1:4, t:5, J:6, S_1:7 },
{ S:2, h:3, e:4, e_1:5, t:6, J:7, S_1:8 }
], {header:["S","h","e","e_1","t","J","S_1"]});
Alternatively, the header row can be skipped:
var ws = XLSX.utils.json_to_sheet([
{ A:"S", B:"h", C:"e", D:"e", E:"t", F:"J", G:"S" },
{ A: 1, B: 2, C: 3, D: 4, E: 5, F: 6, G: 7 },
{ A: 2, B: 3, C: 4, D: 5, E: 6, F: 7, G: 8 }
], {header:["A","B","C","D","E","F","G"], skipHeader:true});
XLSX.utils.sheet_add_json
takes an array of objects and updates an existing worksheet object. It follows the same process as json_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
header | Use specified column order (default Object.keys ) | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
skipHeader | false | If true, do not include header row in output |
nullError | false | If true, emit #NULL! error cells for null values |
origin | Use specified cell as starting point (see below) |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
Consider the worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | | | 5 | 6 | 7 |
3 | 2 | 3 | | | 6 | 7 | 8 |
4 | 3 | 4 | | | 7 | 8 | 9 |
5 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
This worksheet can be built up in the order A1:G1, A2:B4, E2:G4, A5:G5
:
/* Initial row */
var ws = XLSX.utils.json_to_sheet([
{ A: "S", B: "h", C: "e", D: "e", E: "t", F: "J", G: "S" }
], {header: ["A", "B", "C", "D", "E", "F", "G"], skipHeader: true});
/* Write data starting at A2 */
XLSX.utils.sheet_add_json(ws, [
{ A: 1, B: 2 }, { A: 2, B: 3 }, { A: 3, B: 4 }
], {skipHeader: true, origin: "A2"});
/* Write data starting at E2 */
XLSX.utils.sheet_add_json(ws, [
{ A: 5, B: 6, C: 7 }, { A: 6, B: 7, C: 8 }, { A: 7, B: 8, C: 9 }
], {skipHeader: true, origin: { r: 1, c: 4 }, header: [ "A", "B", "C" ]});
/* Append row */
XLSX.utils.sheet_add_json(ws, [
{ A: 4, B: 5, C: 6, D: 7, E: 8, F: 9, G: 0 }
], {header: ["A", "B", "C", "D", "E", "F", "G"], skipHeader: true, origin: -1});
XLSX.utils.table_to_sheet
takes a table DOM element and returns a worksheet resembling the input table. Numbers are parsed. All other data will be stored as strings.
XLSX.utils.table_to_book
produces a minimal workbook based on the worksheet.
Both functions accept options arguments:
Option Name | Default | Description |
---|---|---|
raw | If true, every cell will hold raw strings | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetRows | 0 | If >0, read the first sheetRows rows of the table |
display | false | If true, hidden rows and cells will not be parsed |
Examples (click to show)
To generate the example sheet, start with the HTML table:
<table id="sheetjs">
<tr><td>S</td><td>h</td><td>e</td><td>e</td><td>t</td><td>J</td><td>S</td></tr>
<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td></tr>
<tr><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td></tr>
</table>
To process the table:
var tbl = document.getElementById('sheetjs');
var wb = XLSX.utils.table_to_book(tbl);
Note: XLSX.read
can handle HTML represented as strings.
XLSX.utils.sheet_add_dom
takes a table DOM element and updates an existing worksheet object. It follows the same process as table_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
raw | If true, every cell will hold raw strings | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetRows | 0 | If >0, read the first sheetRows rows of the table |
display | false | If true, hidden rows and cells will not be parsed |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
A small helper function can create gap rows between tables:
function create_gap_rows(ws, nrows) {
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
ref.e.r += nrows; // add to ending row
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
}
/* first table */
var ws = XLSX.utils.table_to_sheet(document.getElementById('table1'));
create_gap_rows(ws, 1); // one row gap after first table
/* second table */
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
create_gap_rows(ws, 3); // three rows gap after second table
/* third table */
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
XLSX.utils.sheet_to_formulae
generates an array of commands that represent how a person would enter data into an application. Each entry is of the form A1-cell-address=formula-or-value
. String literals are prefixed with a '
in accordance with Excel.
Examples (click to show)
For the example sheet:
> var o = XLSX.utils.sheet_to_formulae(ws);
> [o[0], o[5], o[10], o[15], o[20]];
[ 'A1=\'S', 'F1=\'J', 'D2=4', 'B3=3', 'G3=8' ]
As an alternative to the writeFile
CSV type, XLSX.utils.sheet_to_csv
also produces CSV output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
FS | "," | "Field Separator" delimiter between fields |
RS | "\n" | "Record Separator" delimiter between rows |
dateNF | FMT 14 | Use specified date format in string output |
strip | false | Remove trailing field separators in each record ** |
blankrows | true | Include blank lines in the CSV output |
skipHidden | false | Skips hidden rows/columns in the CSV output |
forceQuotes | false | Force quotes around fields |
strip
will remove trailing commas from each line under default FS/RS
blankrows
must be set to false
to skip blank lines.forceQuotes
forces all cells to be wrapped in quotes.Examples (click to show)
For the example sheet:
> console.log(XLSX.utils.sheet_to_csv(ws));
S,h,e,e,t,J,S
1,2,3,4,5,6,7
2,3,4,5,6,7,8
> console.log(XLSX.utils.sheet_to_csv(ws, {FS:"\t"}));
S h e e t J S
1 2 3 4 5 6 7
2 3 4 5 6 7 8
> console.log(XLSX.utils.sheet_to_csv(ws,{FS:":",RS:"|"}));
S:h:e:e:t:J:S|1:2:3:4:5:6:7|2:3:4:5:6:7:8|
The txt
output type uses the tab character as the field separator. If the codepage
library is available (included in full distribution but not core), the output will be encoded in CP1200
and the BOM will be prepended.
XLSX.utils.sheet_to_txt
takes the same arguments as sheet_to_csv
.
As an alternative to the writeFile
HTML type, XLSX.utils.sheet_to_html
also produces HTML output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
id | Specify the id attribute for the TABLE element | |
editable | false | If true, set contenteditable="true" for every TD |
header | Override header (default html body ) | |
footer | Override footer (default /body /html ) |
Examples (click to show)
For the example sheet:
> console.log(XLSX.utils.sheet_to_html(ws));
// ...
XLSX.utils.sheet_to_json
generates different types of JS objects. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
raw | true | Use raw values (true) or formatted strings (false) |
range | from WS | Override Range (see table below) |
header | Control output format (see table below) | |
dateNF | FMT 14 | Use specified date format in string output |
defval | Use specified value in place of null or undefined | |
blankrows | ** | Include blank lines in the output ** |
raw
only affects cells which have a format code (.z
) field or a formatted text (.w
) field.header
is specified, the first row is considered a data row; if header
is not specified, the first row is the header row and not considered data.header
is not specified, the conversion will automatically disambiguate header entries by affixing _
and a count starting at 1
. For example, if three columns have header foo
the output fields are foo
, foo_1
, foo_2
null
values are returned when raw
is true but are skipped when false.defval
is not specified, null and undefined values are skipped normally. If specified, all null and undefined points will be filled with defval
header
is 1
, the default is to generate blank rows. blankrows
must be set to false
to skip blank rows.header
is not 1
, the default is to skip blank rows. blankrows
must be true to generate blank rowsrange
is expected to be one of:
range | Description |
---|---|
(number) | Use worksheet range but set starting row to the value |
(string) | Use specified range (A1-style bounded range string) |
(default) | Use worksheet range (ws['!ref'] ) |
header
is expected to be one of:
header | Description |
---|---|
1 | Generate an array of arrays ("2D Array") |
"A" | Row object keys are literal column labels |
array of strings | Use specified strings as keys in row objects |
(default) | Read and disambiguate first row as keys |
If header is not 1
, the row object will contain the non-enumerable property __rowNum__
that represents the row of the sheet corresponding to the entry.
Examples (click to show)
For the example sheet:
> XLSX.utils.sheet_to_json(ws);
[ { S: 1, h: 2, e: 3, e_1: 4, t: 5, J: 6, S_1: 7 },
{ S: 2, h: 3, e: 4, e_1: 5, t: 6, J: 7, S_1: 8 } ]
> XLSX.utils.sheet_to_json(ws, {header:"A"});
[ { A: 'S', B: 'h', C: 'e', D: 'e', E: 't', F: 'J', G: 'S' },
{ A: '1', B: '2', C: '3', D: '4', E: '5', F: '6', G: '7' },
{ A: '2', B: '3', C: '4', D: '5', E: '6', F: '7', G: '8' } ]
> XLSX.utils.sheet_to_json(ws, {header:["A","E","I","O","U","6","9"]});
[ { '6': 'J', '9': 'S', A: 'S', E: 'h', I: 'e', O: 'e', U: 't' },
{ '6': '6', '9': '7', A: '1', E: '2', I: '3', O: '4', U: '5' },
{ '6': '7', '9': '8', A: '2', E: '3', I: '4', O: '5', U: '6' } ]
> XLSX.utils.sheet_to_json(ws, {header:1});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ '1', '2', '3', '4', '5', '6', '7' ],
[ '2', '3', '4', '5', '6', '7', '8' ] ]
Example showing the effect of raw
:
> ws['A2'].w = "3"; // set A2 formatted string value
> XLSX.utils.sheet_to_json(ws, {header:1, raw:false});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ '3', '2', '3', '4', '5', '6', '7' ], // <-- A2 uses the formatted string
[ '2', '3', '4', '5', '6', '7', '8' ] ]
> XLSX.utils.sheet_to_json(ws, {header:1});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ 1, 2, 3, 4, 5, 6, 7 ], // <-- A2 uses the raw value
[ 2, 3, 4, 5, 6, 7, 8 ] ]
Despite the library name xlsx
, it supports numerous spreadsheet file formats:
Format | Read | Write |
---|---|---|
Excel Worksheet/Workbook Formats | :-----: | :-----: |
Excel 2007+ XML Formats (XLSX/XLSM) | ✔ | ✔ |
Excel 2007+ Binary Format (XLSB BIFF12) | ✔ | ✔ |
Excel 2003-2004 XML Format (XML "SpreadsheetML") | ✔ | ✔ |
Excel 97-2004 (XLS BIFF8) | ✔ | ✔ |
Excel 5.0/95 (XLS BIFF5) | ✔ | ✔ |
Excel 4.0 (XLS/XLW BIFF4) | ✔ | ✔ |
Excel 3.0 (XLS BIFF3) | ✔ | ✔ |
Excel 2.0/2.1 (XLS BIFF2) | ✔ | ✔ |
Excel Supported Text Formats | :-----: | :-----: |
Delimiter-Separated Values (CSV/TXT) | ✔ | ✔ |
Data Interchange Format (DIF) | ✔ | ✔ |
Symbolic Link (SYLK/SLK) | ✔ | ✔ |
Lotus Formatted Text (PRN) | ✔ | ✔ |
UTF-16 Unicode Text (TXT) | ✔ | ✔ |
Other Workbook/Worksheet Formats | :-----: | :-----: |
Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS) | ✔ | |
OpenDocument Spreadsheet (ODS) | ✔ | ✔ |
Flat XML ODF Spreadsheet (FODS) | ✔ | ✔ |
Uniform Office Format Spreadsheet (标文通 UOS1/UOS2) | ✔ | |
dBASE II/III/IV / Visual FoxPro (DBF) | ✔ | ✔ |
Lotus 1-2-3 (WK1/WK3) | ✔ | ✔ |
Lotus 1-2-3 (WKS/WK2/WK4/123) | ✔ | |
Quattro Pro Spreadsheet (WQ1/WQ2/WB1/WB2/WB3/QPW) | ✔ | |
Works 1.x-3.x DOS / 2.x-5.x Windows Spreadsheet (WKS) | ✔ | |
Works 6.x-9.x Spreadsheet (XLR) | ✔ | |
Other Common Spreadsheet Output Formats | :-----: | :-----: |
HTML Tables | ✔ | ✔ |
Rich Text Format tables (RTF) | ✔ | |
Ethercalc Record Format (ETH) | ✔ | ✔ |
Features not supported by a given file format will not be written. Formats with range limits will be silently truncated:
Format | Last Cell | Max Cols | Max Rows |
---|---|---|---|
Excel 2007+ XML Formats (XLSX/XLSM) | XFD1048576 | 16384 | 1048576 |
Excel 2007+ Binary Format (XLSB BIFF12) | XFD1048576 | 16384 | 1048576 |
Excel 97-2004 (XLS BIFF8) | IV65536 | 256 | 65536 |
Excel 5.0/95 (XLS BIFF5) | IV16384 | 256 | 16384 |
Excel 4.0 (XLS BIFF4) | IV16384 | 256 | 16384 |
Excel 3.0 (XLS BIFF3) | IV16384 | 256 | 16384 |
Excel 2.0/2.1 (XLS BIFF2) | IV16384 | 256 | 16384 |
Lotus 1-2-3 R2 - R5 (WK1/WK3/WK4) | IV8192 | 256 | 8192 |
Lotus 1-2-3 R1 (WKS) | IV2048 | 256 | 2048 |
Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.
File Format Details (click to show)
Core Spreadsheet Formats
XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). The XLSM format, almost identical to XLSX, is used for files containing macros.
The format is standardized in ECMA-376 and later in ISO/IEC 29500. Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.
BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced the concept of a workbook (XLW
files) but also had single-sheet XLS
format. The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12 extended the format in various ways but largely stuck to the same record format.
There is no official specification for any of these formats. Excel 95 can write files in these formats, so record lengths and fields were determined by writing in all of the supported formats and comparing files. Excel 2016 can generate BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.
BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. At its core, it still uses an extended version of the binary record format from older versions of BIFF.
The MS-XLS
specification covers the basics of the file format, and other specifications expand on serialization of features like properties.
Predating XLSX, SpreadsheetML files are simple XML files. There is no official and comprehensive specification, although MS has released documentation on the format. Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.
Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
The MS-XLSB
specification covers the basics of the file format, and other specifications expand on serialization of features like properties.
Excel CSV deviates from RFC4180 in a number of important ways. The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. The parser should generally understand Excel CSV. The writer proactively generates cells for formulae if values are unavailable.
Excel TXT uses tab as the delimiter and code page 1200.
Like in Excel, files starting with 0x49 0x44 ("ID")
are treated as Symbolic Link files. Unlike Excel, if the file does not have a valid SYLK header, it will be proactively reinterpreted as CSV. There are some files with semicolon delimiter that align with a valid SYLK file. For the broadest compatibility, all cells with the value of ID
are automatically wrapped in double-quotes.
Miscellaneous Workbook Formats
Support for other formats is generally far behind XLS/XLSB/XLSX support, due in part to a lack of publicly available documentation. Test files were produced in the respective apps and compared to their XLS exports to determine structure. The main focus is data extraction.
The Lotus formats consist of binary records similar to the BIFF structure. Lotus did release a specification decades ago covering the original WK1 format. Other features were deduced by producing files and comparing to Excel support.
Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.
Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.
The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.
All versions of Works were limited to a single worksheet.
Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.
Works for Windows 3.x - 5.x uses the same format and WKS extension. The BOF record has type FF
Works for Windows 6.x - 9.x use the XLR format. XLR is nearly identical to BIFF8 XLS: it uses the CFB container with a Workbook stream. Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. XLR also includes a WksSSWorkBook
stream similar to Lotus FM3/FMT files.
iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary XML-based format to the current file format based on the iWork Archive (IWA). This format has been used up through the current release (Numbers 11.2).
The parser focuses on extracting raw data from tables. Numbers technically supports multiple tables in a logical worksheet, including custom titles. This parser will generate one worksheet per Numbers table.
ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. Both are detailed in the OASIS standard, but tools like LO/OO add undocumented extensions. The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.
UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. For the most part, the difference between the formats is in the names of tags and attributes.
Miscellaneous Worksheet Formats
Many older formats supported only one worksheet:
DBF is really a typed table format: each column can only hold one data type and each record omits type information. The parser generates a header row and inserts records starting at the second row of the worksheet. The writer makes files compatible with Visual FoxPro extensions.
Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. The reader understands DBF Level 7 extensions like DATETIME.
There is no real documentation. All knowledge was gathered by saving files in various versions of Excel to deduce the meaning of fields. Notes:
Plain formulae are stored in the RC form.
Column widths are rounded to integral characters.
Lotus Formatted Text (PRN)
There is no real documentation, and in fact Excel treats PRN as an output-only file format. Nevertheless we can guess the column widths and reverse-engineer the original layout. Excel's 240 character width limitation is not enforced.
There is no unified definition. Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Where ambiguous, the parser/writer follows the expected behavior from Excel. In particular, Excel extends DIF in incompatible ways:
Since Excel automatically converts numbers-as-strings to numbers, numeric string constants are converted to formulae: "0.3" -> "=""0.3""
DIF technically expects numeric cells to hold the raw numeric data, but Excel permits formatted numbers (including dates)
DIF technically has no support for formulae, but Excel will automatically convert plain formulae. Array formulae are not preserved.
HTML
Excel HTML worksheets include special metadata encoded in styles. For example, mso-number-format
is a localized string containing the number format. Despite the metadata the output is valid HTML, although it does accept bare &
symbols.
The writer adds type metadata to the TD elements via the t
tag. The parser looks for those tags and overrides the default interpretation. For example, text like <td>12345</td>
will be parsed as numbers but <td t="s">12345</td>
will be parsed as text.
Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. The supported codes are a subset of the Word RTF support.
Ethercalc is an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.
(click to show)
make test
will run the node-based tests. By default it runs tests on files in every supported format. To test a specific file type, set FMTS
to the format you want to test. Feature-specific tests are available with make test_misc
$ make test_misc # run core tests
$ make test # run full tests
$ make test_xls # only use the XLS test files
$ make test_xlsx # only use the XLSX test files
$ make test_xlsb # only use the XLSB test files
$ make test_xml # only use the XML test files
$ make test_ods # only use the ODS test files
To enable all errors, set the environment variable WTF=1
:
$ make test # run full tests
$ WTF=1 make test # enable all error messages
flow
and eslint
checks are available:
$ make lint # eslint checks
$ make flow # make lint + Flow checking
$ make tslint # check TS definitions
(click to show)
The core in-browser tests are available at tests/index.html
within this repo. Start a local server and navigate to that directory to run the tests. make ctestserv
will start a server on port 8000.
make ctest
will generate the browser fixtures. To add more files, edit the tests/fixtures.lst
file and add the paths.
To run the full in-browser tests, clone the repo for oss.sheetjs.com
and replace the xlsx.js
file (then open a browser window and go to stress.html
):
$ cp xlsx.js ../SheetJS.github.io
$ cd ../SheetJS.github.io
$ simplehttpserver # or "python -mSimpleHTTPServer" or "serve"
$ open -a Chromium.app http://localhost:8000/stress.html
(click to show)
0.8
, 0.10
, 0.12
, 4.x
, 5.x
, 6.x
, 7.x
, 8.x
Tests utilize the mocha testing framework.
The test suite also includes tests for various time zones. To change the timezone locally, set the TZ environment variable:
$ env TZ="Asia/Kolkata" WTF=1 make test_misc
Test files are housed in another repo.
Running make init
will refresh the test_files
submodule and get the files. Note that this requires svn
, git
, hg
and other commands that may not be available. If make init
fails, please download the latest version of the test files snapshot from the repo
Latest Snapshot (click to show)
Latest test files snapshot: http://github.com/SheetJS/test_files/releases/download/20170409/test_files.zip
(download and unzip to the test_files
subdirectory)
Due to the precarious nature of the Open Specifications Promise, it is very important to ensure code is cleanroom. Contribution Notes
File organization (click to show)
At a high level, the final script is a concatenation of the individual files in the bits
folder. Running make
should reproduce the final output on all platforms. The README is similarly split into bits in the docbits
folder.
Folders:
folder | contents |
---|---|
bits | raw source files that make up the final script |
docbits | raw markdown files that make up README.md |
bin | server-side bin scripts (xlsx.njs ) |
dist | dist files for web browsers and nonstandard JS environments |
demos | demo projects for platforms like ExtendScript and Webpack |
tests | browser tests (run make ctest to rebuild) |
types | typescript definitions and tests |
misc | miscellaneous supporting scripts |
test_files | test files (pulled from the test files repository) |
After cloning the repo, running make help
will display a list of commands.
(click to show)
The xlsx.js
file is constructed from the files in the bits
subdirectory. The build script (run make
) will concatenate the individual bits to produce the script. Before submitting a contribution, ensure that running make will produce the xlsx.js
file exactly. The simplest way to test is to add the script:
$ git add xlsx.js
$ make clean
$ make
$ git diff xlsx.js
To produce the dist files, run make dist
. The dist files are updated in each version release and should not be committed between versions.
(click to show)
The included make.cmd
script will build xlsx.js
from the bits
directory. Building is as simple as:
> make
To prepare development environment:
> make init
The full list of commands available in Windows are displayed in make help
:
make init -- install deps and global modules
make lint -- run eslint linter
make test -- run mocha test suite
make misc -- run smaller test suite
make book -- rebuild README and summary
make help -- display this message
As explained in Test Files, on Windows the release ZIP file must be downloaded and extracted. If Bash on Windows is available, it is possible to run the OSX/Linux workflow. The following steps prepares the environment:
# Install support programs for the build and test commands
sudo apt-get install make git subversion mercurial
# Install nodejs and NPM within the WSL
wget -qO- https://deb.nodesource.com/setup_8.x | sudo bash
sudo apt-get install nodejs
# Install dev dependencies
sudo npm install -g mocha voc blanket xlsjs
(click to show)
The test_misc
target (make test_misc
on Linux/OSX / make misc
on Windows) runs the targeted feature tests. It should take 5-10 seconds to perform feature tests without testing against the entire test battery. New features should be accompanied with tests for the relevant file formats and features.
For tests involving the read side, an appropriate feature test would involve reading an existing file and checking the resulting workbook object. If a parameter is involved, files should be read with different values to verify that the feature is working as expected.
For tests involving a new write feature which can already be parsed, appropriate feature tests would involve writing a workbook with the feature and then opening and verifying that the feature is preserved.
For tests involving a new write feature without an existing read ability, please add a feature test to the kitchen sink tests/write.js
.
OSP-covered Specifications (click to show)
MS-CFB
: Compound File Binary File FormatMS-CTXLS
: Excel Custom Toolbar Binary File FormatMS-EXSPXML3
: Excel Calculation Version 2 Web Service XML SchemaMS-ODATA
: Open Data Protocol (OData)MS-ODRAW
: Office Drawing Binary File FormatMS-ODRAWXML
: Office Drawing Extensions to Office Open XML StructureMS-OE376
: Office Implementation Information for ECMA-376 Standards SupportMS-OFFCRYPTO
: Office Document Cryptography StructureMS-OI29500
: Office Implementation Information for ISO/IEC 29500 Standards SupportMS-OLEDS
: Object Linking and Embedding (OLE) Data StructuresMS-OLEPS
: Object Linking and Embedding (OLE) Property Set Data StructuresMS-OODF3
: Office Implementation Information for ODF 1.2 Standards SupportMS-OSHARED
: Office Common Data Types and Objects StructuresMS-OVBA
: Office VBA File Format StructureMS-XLDM
: Spreadsheet Data Model File FormatMS-XLS
: Excel Binary File Format (.xls) Structure SpecificationMS-XLSB
: Excel (.xlsb) Binary File FormatMS-XLSX
: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File FormatXLS
: Microsoft Office Excel 97-2007 Binary File Format SpecificationRTF
: Rich Text FormatAuthor: SheetJS
Source Code: https://github.com/SheetJS/sheetjs
License: Apache-2.0 License
1643018220
Parser and writer for various spreadsheet formats. Pure-JS cleanroom implementation from official specifications, related documents, and test files. Emphasis on parsing and writing robustness, cross-format feature compatibility with a unified JS representation, and ES3/ES5 browser compatibility back to IE6.
This is the community version. We also offer a pro version with performance enhancements, additional features like styling, and dedicated support.
Community Translations of this README:
Supported File Formats
Diagram Legend (click to show)
In the browser, just add a script tag:
<script lang="javascript" src="dist/xlsx.full.min.js"></script>
CDN Availability (click to show)
CDN | URL |
---|---|
unpkg | https://unpkg.com/xlsx/ |
jsDelivr | https://jsdelivr.com/package/npm/xlsx |
CDNjs | https://cdnjs.com/libraries/xlsx |
packd | https://bundle.run/xlsx@latest?name=XLSX |
unpkg
makes the latest version available at:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
With npm:
$ npm install xlsx
With bower:
$ bower install js-xlsx
The demos
directory includes sample projects for:
Frameworks and APIs
angularjs
angular and ionic
knockout
meteor
react and react-native
vue 2.x and weex
XMLHttpRequest and fetch
nodejs server
databases and key/value stores
typed arrays and math
Bundlers and Tooling
Platforms and Integrations
electron application
nw.js application
Chrome / Chromium extensions
Adobe ExtendScript
Headless Browsers
canvas-datagrid
x-spreadsheet
Swift JSC and other engines
"serverless" functions
internet explorer
Other examples are included in the showcase.
Optional features (click to show)
The node version automatically requires modules for additional features. Some of these modules are rather large in size and are only needed in special circumstances, so they do not ship with the core. For browser use, they must be included directly:
<!-- international support from js-codepage -->
<script src="dist/cpexcel.js"></script>
An appropriate version for each dependency is included in the dist/ directory.
The complete single-file version is generated at dist/xlsx.full.min.js
A slimmer build is generated at dist/xlsx.mini.min.js
. Compared to full build:
Webpack and Browserify builds include optional modules by default. Webpack can be configured to remove support with resolve.alias
:
/* uncomment the lines below to remove support */
resolve: {
alias: { "./dist/cpexcel.js": "" } // <-- omit international support
}
Since the library uses functions like Array#forEach
, older browsers require shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js
:
<!-- add the shim first -->
<script type="text/javascript" src="shim.min.js"></script>
<!-- after the shim is referenced, add the library -->
<script type="text/javascript" src="xlsx.full.min.js"></script>
The script also includes IE_LoadFile
and IE_SaveFile
for loading and saving files in Internet Explorer versions 6-9. The xlsx.extendscript.js
script bundles the shim in a format suitable for Photoshop and other Adobe products.
Philosophy (click to show)
Prior to SheetJS, APIs for processing spreadsheet files were format-specific. Third-party libraries either supported one format, or they involved a separate set of classes for each supported file type. Even though XLSB was introduced in Excel 2007, nothing outside of SheetJS or Excel supported the format.
To promote a format-agnostic view, SheetJS starts from a pure-JS representation that we call the "Common Spreadsheet Format". Emphasizing a uniform object representation enables new features like format conversion (reading an XLSX template and saving as XLS) and circumvents the mess of classes. By abstracting the complexities of the various formats, tools need not worry about the specific file type!
A simple object representation combined with careful coding practices enables use cases in older browsers and in alternative environments like ExtendScript and Web Workers. It is always tempting to use the latest and greatest features, but they tend to require the latest versions of browsers, limiting usability.
Utility functions capture common use cases like generating JS objects or HTML. Most simple operations should only require a few lines of code. More complex operations generally should be straightforward to implement.
Excel pushes the XLSX format as default starting in Excel 2007. However, there are other formats with more appealing properties. For example, the XLSB format is spiritually similar to XLSX but files often tend up taking less than half the space and open much faster! Even though an XLSX writer is available, other format writers are available so users can take advantage of the unique characteristics of each format.
The primary focus of the Community Edition is correct data interchange, focused on extracting data from any compatible data representation and exporting data in various formats suitable for any third party interface.
For parsing, the first step is to read the file. This involves acquiring the data and feeding it into the library. Here are a few common scenarios:
nodejs read a file (click to show)
readFile
is only available in server environments. Browsers have no API for reading arbitrary files given a path, so another strategy must be used.
if(typeof require !== 'undefined') XLSX = require('xlsx');
var workbook = XLSX.readFile('test.xlsx');
/* DO SOMETHING WITH workbook HERE */
Photoshop ExtendScript read a file (click to show)
readFile
wraps the File
logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + '/' + 'test.xlsx');
/* DO SOMETHING WITH workbook HERE */
The extendscript
demo includes a more complex example.
Browser read TABLE element from page (click to show)
The table_to_book
and table_to_sheet
utility functions take a DOM TABLE element and iterate through the child nodes.
var workbook = XLSX.utils.table_to_book(document.getElementById('tableau'));
/* DO SOMETHING WITH workbook HERE */
Multiple tables on a web page can be converted to individual worksheets:
/* create new workbook */
var workbook = XLSX.utils.book_new();
/* convert table 'table1' to worksheet named "Sheet1" */
var ws1 = XLSX.utils.table_to_sheet(document.getElementById('table1'));
XLSX.utils.book_append_sheet(workbook, ws1, "Sheet1");
/* convert table 'table2' to worksheet named "Sheet2" */
var ws2 = XLSX.utils.table_to_sheet(document.getElementById('table2'));
XLSX.utils.book_append_sheet(workbook, ws2, "Sheet2");
/* workbook now has 2 worksheets */
Alternatively, the HTML code can be extracted and parsed:
var htmlstr = document.getElementById('tableau').outerHTML;
var workbook = XLSX.read(htmlstr, {type:'string'});
Browser download file (ajax) (click to show)
Note: for a more complete example that works in older browsers, check the demo at http://oss.sheetjs.com/sheetjs/ajax.html. The xhr
demo includes more examples with XMLHttpRequest
and fetch
.
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
}
req.send();
Browser drag-and-drop (click to show)
For modern browsers, Blob#arrayBuffer
can read data from files:
async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = evt.dataTransfer.files[0];
const data = await f.arrayBuffer();
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener('drop', handleDropAsync, false);
For maximal compatibility, the FileReader
API should be used:
function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
var reader = new FileReader();
reader.onload = function(e) {
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener('drop', handleDrop, false);
Browser file upload form element (click to show)
Data from file input elements can be processed using the same APIs as in the drag-and-drop example.
Using Blob#arrayBuffer
:
async function handleFileAsync(e) {
const file = e.target.files[0];
const data = await file.arrayBuffer();
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener('change', handleFileAsync, false);
Using FileReader
:
function handleFile(e) {
var files = e.target.files, f = files[0];
var reader = new FileReader();
reader.onload = function(e) {
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
input_dom_element.addEventListener('change', handleFile, false);
The oldie
demo shows an IE-compatible fallback scenario.
More specialized cases, including mobile app file processing, are covered in the included demos
Note that older versions of IE do not support HTML5 File API, so the Base64 mode is used for testing.
Get Base64 encoding on OSX / Windows (click to show)
On OSX you can get the Base64 encoding with:
$ <target_file base64 | pbcopy
On Windows XP and up you can get the Base64 encoding using certutil
:
> certutil -encode target_file target_file.b64
(note: You have to open the file and remove the header and footer lines)
Why is there no Streaming Read API? (click to show)
The most common and interesting formats (XLS, XLSX/M, XLSB, ODS) are ultimately ZIP or CFB containers of files. Neither format puts the directory structure at the beginning of the file: ZIP files place the Central Directory records at the end of the logical file, while CFB files can place the storage info anywhere in the file! As a result, to properly handle these formats, a streaming function would have to buffer the entire file before commencing. That belies the expectations of streaming, so we do not provide any streaming read API.
When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end. This can be done with a temporary file or by explicitly concatenating the stream:
Explicitly concatenating streams (click to show)
var fs = require('fs');
var XLSX = require('xlsx');
function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
var buffers = [];
stream.on('data', function(data) { buffers.push(data); });
stream.on('end', function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
More robust solutions are available using modules like concat-stream
.
Writing to filesystem first (click to show)
This example uses tempfile
to generate file names:
var fs = require('fs'), tempfile = require('tempfile');
var XLSX = require('xlsx');
function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
var fname = tempfile('.sheetjs');
console.log(fname);
var ostream = fs.createWriteStream(fname);
stream.pipe(ostream);
ostream.on('finish', function() {
var workbook = XLSX.readFile(fname);
fs.unlinkSync(fname);
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
The full object format is described later in this README.
Reading a specific cell (click to show)
This example extracts the value stored in cell A1 from the first worksheet:
var first_sheet_name = workbook.SheetNames[0];
var address_of_cell = 'A1';
/* Get worksheet */
var worksheet = workbook.Sheets[first_sheet_name];
/* Find desired cell */
var desired_cell = worksheet[address_of_cell];
/* Get the value */
var desired_value = (desired_cell ? desired_cell.v : undefined);
Adding a new worksheet to a workbook (click to show)
This example uses XLSX.utils.aoa_to_sheet
to make a sheet and XLSX.utils.book_append_sheet
to append the sheet to the workbook:
var ws_name = "SheetJS";
/* make worksheet */
var ws_data = [
[ "S", "h", "e", "e", "t", "J", "S" ],
[ 1 , 2 , 3 , 4 , 5 ]
];
var ws = XLSX.utils.aoa_to_sheet(ws_data);
/* Add the worksheet to the workbook */
XLSX.utils.book_append_sheet(wb, ws, ws_name);
Creating a new workbook from scratch (click to show)
The workbook object contains a SheetNames
array of names and a Sheets
object mapping sheet names to sheet objects. The XLSX.utils.book_new
utility function creates a new workbook object:
/* create a new blank workbook */
var wb = XLSX.utils.book_new();
The new workbook is blank and contains no worksheets. The write functions will error if the workbook is empty.
https://sheetjs.com/demos/modify.html read + modify + write files
https://github.com/SheetJS/sheetjs/blob/HEAD/bin/xlsx.njs node
The node version installs a command line tool xlsx
which can read spreadsheet files and output the contents in various formats. The source is available at xlsx.njs
in the bin directory.
Some helper functions in XLSX.utils
generate different views of the sheets:
XLSX.utils.sheet_to_csv
generates CSVXLSX.utils.sheet_to_txt
generates UTF16 Formatted TextXLSX.utils.sheet_to_html
generates HTMLXLSX.utils.sheet_to_json
generates an array of objectsXLSX.utils.sheet_to_formulae
generates a list of formulaeFor writing, the first step is to generate output data. The helper functions write
and writeFile
will produce the data in various formats suitable for dissemination. The second step is to actual share the data with the end point. Assuming workbook
is a workbook object:
nodejs write a file (click to show)
XLSX.writeFile
uses fs.writeFileSync
in server environments:
if(typeof require !== 'undefined') XLSX = require('xlsx');
/* output format determined by filename */
XLSX.writeFile(workbook, 'out.xlsb');
/* at this point, out.xlsb is a file that you can distribute */
Photoshop ExtendScript write a file (click to show)
writeFile
wraps the File
logic in Photoshop and other ExtendScript targets. The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* output format determined by filename */
XLSX.writeFile(workbook, 'out.xlsx');
/* at this point, out.xlsx is a file that you can distribute */
The extendscript
demo includes a more complex example.
Browser add TABLE element to page (click to show)
The sheet_to_html
utility function generates HTML code that can be added to any DOM element.
var worksheet = workbook.Sheets[workbook.SheetNames[0]];
var container = document.getElementById('tableau');
container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
Browser upload file (ajax) (click to show)
A complete example using XHR is included in the XHR demo, along with examples for fetch and wrapper libraries. This example assumes the server can handle Base64-encoded files (see the demo for a basic nodejs server):
/* in this example, send a base64 string to the server */
var wopts = { bookType:'xlsx', bookSST:false, type:'base64' };
var wbout = XLSX.write(workbook,wopts);
var req = new XMLHttpRequest();
req.open("POST", "/upload", true);
var formdata = new FormData();
formdata.append('file', 'test.xlsx'); // <-- server expects `file` to hold name
formdata.append('data', wbout); // <-- `data` holds the base64-encoded data
req.send(formdata);
Browser save file (click to show)
XLSX.writeFile
wraps a few techniques for triggering a file save:
URL
browser API creates an object URL for the file, which the library uses by creating a link and forcing a click. It is supported in modern browsers.msSaveBlob
is an IE10+ API for triggering a file save.IE_FileSave
uses VBScript and ActiveX to write a file in IE6+ for Windows XP and Windows 7. The shim must be included in the containing HTML page.There is no standard way to determine if the actual file has been downloaded.
/* output format determined by filename */
XLSX.writeFile(workbook, 'out.xlsb');
/* at this point, out.xlsb will have been downloaded */
Browser save file (compatibility) (click to show)
XLSX.writeFile
techniques work for most modern browsers as well as older IE. For much older browsers, there are workarounds implemented by wrapper libraries.
FileSaver.js
implements saveAs
. Note: XLSX.writeFile
will automatically call saveAs
if available.
/* bookType can be any supported output type */
var wopts = { bookType:'xlsx', bookSST:false, type:'array' };
var wbout = XLSX.write(workbook,wopts);
/* the saveAs call downloads a file on the local machine */
saveAs(new Blob([wbout],{type:"application/octet-stream"}), "test.xlsx");
Downloadify
uses a Flash SWF button to generate local files, suitable for environments where ActiveX is unavailable:
Downloadify.create(id,{
/* other options are required! read the downloadify docs for more info */
filename: "test.xlsx",
data: function() { return XLSX.write(wb, {bookType:"xlsx", type:'base64'}); },
append: false,
dataType: 'base64'
});
The oldie
demo shows an IE-compatible fallback scenario.
The included demos cover mobile apps and other special deployments.
The streaming write functions are available in the XLSX.stream
object. They take the same arguments as the normal write functions but return a Readable Stream. They are only exposed in NodeJS.
XLSX.stream.to_csv
is the streaming version of XLSX.utils.sheet_to_csv
.XLSX.stream.to_html
is the streaming version of XLSX.utils.sheet_to_html
.XLSX.stream.to_json
is the streaming version of XLSX.utils.sheet_to_json
.nodejs convert to CSV and write file (click to show)
var output_file_name = "out.csv";
var stream = XLSX.stream.to_csv(worksheet);
stream.pipe(fs.createWriteStream(output_file_name));
nodejs write JSON stream to screen (click to show)
/* to_json returns an object-mode stream */
var stream = XLSX.stream.to_json(worksheet, {raw:true});
/* the following stream converts JS objects to text via JSON.stringify */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };
stream.pipe(conv); conv.pipe(process.stdout);
https://github.com/sheetjs/sheetaki pipes write streams to nodejs response.
XLSX
is the exposed variable in the browser and the exported node variable
XLSX.version
is the version of the library (added by the build script).
XLSX.SSF
is an embedded version of the format library.
XLSX.read(data, read_opts)
attempts to parse data
.
XLSX.readFile(filename, read_opts)
attempts to read filename
and parse.
Parse options are described in the Parsing Options section.
XLSX.write(wb, write_opts)
attempts to write the workbook wb
XLSX.writeFile(wb, filename, write_opts)
attempts to write wb
to filename
. In browser-based environments, it will attempt to force a client-side download.
XLSX.writeFileAsync(wb, filename, o, cb)
attempts to write wb
to filename
. If o
is omitted, the writer will use the third argument as the callback.
XLSX.stream
contains a set of streaming write functions.
Write options are described in the Writing Options section.
Utilities are available in the XLSX.utils
object and are described in the Utility Functions section:
Importing:
aoa_to_sheet
converts an array of arrays of JS data to a worksheet.json_to_sheet
converts an array of JS objects to a worksheet.table_to_sheet
converts a DOM TABLE element to a worksheet.sheet_add_aoa
adds an array of arrays of JS data to an existing worksheet.sheet_add_json
adds an array of JS objects to an existing worksheet.Exporting:
sheet_to_json
converts a worksheet object to an array of JSON objects.sheet_to_csv
generates delimiter-separated-values output.sheet_to_txt
generates UTF16 formatted text.sheet_to_html
generates HTML output.sheet_to_formulae
generates a list of the formulae (with value fallbacks).Cell and cell address manipulation:
format_cell
generates the text value for a cell (using number formats).encode_row / decode_row
converts between 0-indexed rows and 1-indexed rows.encode_col / decode_col
converts between 0-indexed columns and column names.encode_cell / decode_cell
converts cell addresses.encode_range / decode_range
converts cell ranges.SheetJS conforms to the Common Spreadsheet Format (CSF):
Cell address objects are stored as {c:C, r:R}
where C
and R
are 0-indexed column and row numbers, respectively. For example, the cell address B5
is represented by the object {c:1, r:4}
.
Cell range objects are stored as {s:S, e:E}
where S
is the first cell and E
is the last cell in the range. The ranges are inclusive. For example, the range A3:B7
is represented by the object {s:{c:0, r:2}, e:{c:1, r:6}}
. Utility functions perform a row-major order walk traversal of a sheet range:
for(var R = range.s.r; R <= range.e.r; ++R) {
for(var C = range.s.c; C <= range.e.c; ++C) {
var cell_address = {c:C, r:R};
/* if an A1-style address is needed, encode the address */
var cell_ref = XLSX.utils.encode_cell(cell_address);
}
}
Cell objects are plain JS objects with keys and values following the convention:
Key | Description |
---|---|
v | raw value (see Data Types section for more info) |
w | formatted text (if applicable) |
t | type: b Boolean, e Error, n Number, d Date, s Text, z Stub |
f | cell formula encoded as an A1-style string (if applicable) |
F | range of enclosing array if formula is array formula (if applicable) |
r | rich text encoding (if applicable) |
h | HTML rendering of the rich text (if applicable) |
c | comments associated with the cell |
z | number format string associated with the cell (if requested) |
l | cell hyperlink object (.Target holds link, .Tooltip is tooltip) |
s | the style/theme of the cell (if applicable) |
Built-in export utilities (such as the CSV exporter) will use the w
text if it is available. To change a value, be sure to delete cell.w
(or set it to undefined
) before attempting to export. The utilities will regenerate the w
text from the number format (cell.z
) and the raw value if possible.
The actual array formula is stored in the f
field of the first cell in the array range. Other cells in the range will omit the f
field.
The raw value is stored in the v
value property, interpreted based on the t
type property. This separation allows for representation of numbers as well as numeric text. There are 6 valid cell types:
Type | Description |
---|---|
b | Boolean: value interpreted as JS boolean |
e | Error: value is a numeric code and w property stores common name ** |
n | Number: value is a JS number ** |
d | Date: value is a JS Date object or string to be parsed as Date ** |
s | Text: value interpreted as JS string and written as text ** |
z | Stub: blank stub cell that is ignored by data processing utilities ** |
Error values and interpretation (click to show)
Value | Error Meaning |
---|---|
0x00 | #NULL! |
0x07 | #DIV/0! |
0x0F | #VALUE! |
0x17 | #REF! |
0x1D | #NAME? |
0x24 | #NUM! |
0x2A | #N/A |
0x2B | #GETTING_DATA |
Type n
is the Number type. This includes all forms of data that Excel stores as numbers, such as dates/times and Boolean fields. Excel exclusively uses data that can be fit in an IEEE754 floating point number, just like JS Number, so the v
field holds the raw number. The w
field holds formatted text. Dates are stored as numbers by default and converted with XLSX.SSF.parse_date_code
.
Type d
is the Date type, generated only when the option cellDates
is passed. Since JSON does not have a natural Date type, parsers are generally expected to store ISO 8601 Date strings like you would get from date.toISOString()
. On the other hand, writers and exporters should be able to handle date strings and JS Date objects. Note that Excel disregards timezone modifiers and treats all dates in the local timezone. The library does not correct for this error.
Type s
is the String type. Values are explicitly stored as text. Excel will interpret these cells as "number stored as text". Generated Excel files automatically suppress that class of error, but other formats may elicit errors.
Type z
represents blank stub cells. They are generated in cases where cells have no assigned value but hold comments or other metadata. They are ignored by the core library data processing utility functions. By default these cells are not generated; the parser sheetStubs
option must be set to true
.
Excel Date Code details (click to show)
By default, Excel stores dates as numbers with a format code that specifies date processing. For example, the date 19-Feb-17
is stored as the number 42785
with a number format of d-mmm-yy
. The SSF
module understands number formats and performs the appropriate conversion.
XLSX also supports a special date type d
where the data is an ISO 8601 date string. The formatter converts the date back to a number.
The default behavior for all parsers is to generate number cells. Setting cellDates
to true will force the generators to store dates.
Time Zones and Dates (click to show)
Excel has no native concept of universal time. All times are specified in the local time zone. Excel limitations prevent specifying true absolute dates.
Following Excel, this library treats all dates as relative to local time zone.
Epochs: 1900 and 1904 (click to show)
Excel supports two epochs (January 1 1900 and January 1 1904). The workbook's epoch can be determined by examining the workbook's wb.Workbook.WBProps.date1904
property:
!!(((wb.Workbook||{}).WBProps||{}).date1904)
Each key that does not start with !
maps to a cell (using A-1
notation)
sheet[address]
returns the cell object for the specified address.
Special sheet keys (accessible as sheet[key]
, each starting with !
):
sheet['!ref']
: A-1 based range representing the sheet range. Functions that work with sheets should use this parameter to determine the range. Cells that are assigned outside of the range are not processed. In particular, when writing a sheet by hand, cells outside of the range are not included
Functions that handle sheets should test for the presence of !ref
field. If the !ref
is omitted or is not a valid range, functions are free to treat the sheet as empty or attempt to guess the range. The standard utilities that ship with this library treat sheets as empty (for example, the CSV output is empty string).
When reading a worksheet with the sheetRows
property set, the ref parameter will use the restricted range. The original range is set at ws['!fullref']
sheet['!margins']
: Object representing the page margins. The default values follow Excel's "normal" preset. Excel also has a "wide" and a "narrow" preset but they are stored as raw measurements. The main properties are listed below:
Page margin details (click to show)
key | description | "normal" | "wide" | "narrow" |
---|---|---|---|---|
left | left margin (inches) | 0.7 | 1.0 | 0.25 |
right | right margin (inches) | 0.7 | 1.0 | 0.25 |
top | top margin (inches) | 0.75 | 1.0 | 0.75 |
bottom | bottom margin (inches) | 0.75 | 1.0 | 0.75 |
header | header margin (inches) | 0.3 | 0.5 | 0.3 |
footer | footer margin (inches) | 0.3 | 0.5 | 0.3 |
/* Set worksheet sheet to "normal" */
ws["!margins"]={left:0.7, right:0.7, top:0.75,bottom:0.75,header:0.3,footer:0.3}
/* Set worksheet sheet to "wide" */
ws["!margins"]={left:1.0, right:1.0, top:1.0, bottom:1.0, header:0.5,footer:0.5}
/* Set worksheet sheet to "narrow" */
ws["!margins"]={left:0.25,right:0.25,top:0.75,bottom:0.75,header:0.3,footer:0.3}
In addition to the base sheet keys, worksheets also add:
ws['!cols']
: array of column properties objects. Column widths are actually stored in files in a normalized manner, measured in terms of the "Maximum Digit Width" (the largest width of the rendered digits 0-9, in pixels). When parsed, the column objects store the pixel width in the wpx
field, character width in the wch
field, and the maximum digit width in the MDW
field.
ws['!rows']
: array of row properties objects as explained later in the docs. Each row object encodes properties including row height and visibility.
ws['!merges']
: array of range objects corresponding to the merged cells in the worksheet. Plain text formats do not support merge cells. CSV export will write all cells in the merge range if they exist, so be sure that only the first cell (upper-left) in the range is set.
ws['!outline']
: configure how outlines should behave. Options default to the default settings in Excel 2019:
key | Excel feature | default |
---|---|---|
above | Uncheck "Summary rows below detail" | false |
left | Uncheck "Summary rows to the right of detail" | false |
ws['!protect']
: object of write sheet protection properties. The password
key specifies the password for formats that support password-protected sheets (XLSX/XLSB/XLS). The writer uses the XOR obfuscation method. The following keys control the sheet protection -- set to false
to enable a feature when sheet is locked or set to true
to disable a feature:Worksheet Protection Details (click to show)
key | feature (true=disabled / false=enabled) | default |
---|---|---|
selectLockedCells | Select locked cells | enabled |
selectUnlockedCells | Select unlocked cells | enabled |
formatCells | Format cells | disabled |
formatColumns | Format columns | disabled |
formatRows | Format rows | disabled |
insertColumns | Insert columns | disabled |
insertRows | Insert rows | disabled |
insertHyperlinks | Insert hyperlinks | disabled |
deleteColumns | Delete columns | disabled |
deleteRows | Delete rows | disabled |
sort | Sort | disabled |
autoFilter | Filter | disabled |
pivotTables | Use PivotTable reports | disabled |
objects | Edit objects | enabled |
scenarios | Edit scenarios | enabled |
ws['!autofilter']
: AutoFilter object following the schema:type AutoFilter = {
ref:string; // A-1 based range representing the AutoFilter table range
}
Chartsheets are represented as standard sheets. They are distinguished with the !type
property set to "chart"
.
The underlying data and !ref
refer to the cached data in the chartsheet. The first row of the chartsheet is the underlying header.
Macrosheets are represented as standard sheets. They are distinguished with the !type
property set to "macro"
.
Dialogsheets are represented as standard sheets. They are distinguished with the !type
property set to "dialog"
.
workbook.SheetNames
is an ordered list of the sheets in the workbook
wb.Sheets[sheetname]
returns an object representing the worksheet.
wb.Props
is an object storing the standard properties. wb.Custprops
stores custom properties. Since the XLS standard properties deviate from the XLSX standard, XLS parsing stores core properties in both places.
wb.Workbook
stores workbook-level attributes.
The various file formats use different internal names for file properties. The workbook Props
object normalizes the names:
File Properties (click to show)
JS Name | Excel Description |
---|---|
Title | Summary tab "Title" |
Subject | Summary tab "Subject" |
Author | Summary tab "Author" |
Manager | Summary tab "Manager" |
Company | Summary tab "Company" |
Category | Summary tab "Category" |
Keywords | Summary tab "Keywords" |
Comments | Summary tab "Comments" |
LastAuthor | Statistics tab "Last saved by" |
CreatedDate | Statistics tab "Created" |
For example, to set the workbook title property:
if(!wb.Props) wb.Props = {};
wb.Props.Title = "Insert Title Here";
Custom properties are added in the workbook Custprops
object:
if(!wb.Custprops) wb.Custprops = {};
wb.Custprops["Custom Property"] = "Custom Value";
Writers will process the Props
key of the options object:
/* force the Author to be "SheetJS" */
XLSX.write(wb, {Props:{Author:"SheetJS"}});
wb.Workbook
stores workbook-level attributes.
wb.Workbook.Names
is an array of defined name objects which have the keys:
Defined Name Properties (click to show)
Key | Description |
---|---|
Sheet | Name scope. Sheet Index (0 = first sheet) or null (Workbook) |
Name | Case-sensitive name. Standard rules apply ** |
Ref | A1-style Reference ("Sheet1!$A$1:$D$20" ) |
Comment | Comment (only applicable for XLS/XLSX/XLSB) |
Excel allows two sheet-scoped defined names to share the same name. However, a sheet-scoped name cannot collide with a workbook-scope name. Workbook writers may not enforce this constraint.
wb.Workbook.Views
is an array of workbook view objects which have the keys:
Key | Description |
---|---|
RTL | If true, display right-to-left |
wb.Workbook.WBProps
holds other workbook properties:
Key | Description |
---|---|
CodeName | VBA Project Workbook Code Name |
date1904 | epoch: 0/false for 1900 system, 1/true for 1904 |
filterPrivacy | Warn or strip personally identifying info on save |
Even for basic features like date storage, the official Excel formats store the same content in different ways. The parsers are expected to convert from the underlying file format representation to the Common Spreadsheet Format. Writers are expected to convert from CSF back to the underlying file format.
The A1-style formula string is stored in the f
field. Even though different file formats store the formulae in different ways, the formats are translated. Even though some formats store formulae with a leading equal sign, CSF formulae do not start with =
.
Representation of A1=1, A2=2, A3=A1+A2 (click to show)
{
"!ref": "A1:A3",
A1: { t:'n', v:1 },
A2: { t:'n', v:2 },
A3: { t:'n', v:3, f:'A1+A2' }
}
Shared formulae are decompressed and each cell has the formula corresponding to its cell. Writers generally do not attempt to generate shared formulae.
Cells with formula entries but no value will be serialized in a way that Excel and other spreadsheet tools will recognize. This library will not automatically compute formula results! For example, to compute BESSELJ
in a worksheet:
Formula without known value (click to show)
{
"!ref": "A1:A3",
A1: { t:'n', v:3.14159 },
A2: { t:'n', v:2 },
A3: { t:'n', f:'BESSELJ(A1,A2)' }
}
Array Formulae
Array formulae are stored in the top-left cell of the array block. All cells of an array formula have a F
field corresponding to the range. A single-cell formula can be distinguished from a plain formula by the presence of F
field.
Array Formula examples (click to show)
For example, setting the cell C1
to the array formula {=SUM(A1:A3*B1:B3)}
:
worksheet['C1'] = { t:'n', f: "SUM(A1:A3*B1:B3)", F:"C1:C1" };
For a multi-cell array formula, every cell has the same array range but only the first cell specifies the formula. Consider D1:D3=A1:A3*B1:B3
:
worksheet['D1'] = { t:'n', F:"D1:D3", f:"A1:A3*B1:B3" };
worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
Utilities and writers are expected to check for the presence of a F
field and ignore any possible formula element f
in cells other than the starting cell. They are not expected to perform validation of the formulae!
Formula Output Utility Function (click to show)
The sheet_to_formulae
method generates one line per formula or array formula. Array formulae are rendered in the form range=formula
while plain cells are rendered in the form cell=formula or value
. Note that string literals are prefixed with an apostrophe '
, consistent with Excel's formula bar display.
Formulae File Format Details (click to show)
Storage Representation | Formats | Read | Write |
---|---|---|---|
A1-style strings | XLSX | ✔ | ✔ |
RC-style strings | XLML and plain text | ✔ | ✔ |
BIFF Parsed formulae | XLSB and all XLS formats | ✔ | |
OpenFormula formulae | ODS/FODS/UOS | ✔ | ✔ |
Lotus Parsed formulae | All Lotus WK_ formats | ✔ |
Since Excel prohibits named cells from colliding with names of A1 or RC style cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed formulae and Lotus Parsed formulae have to be explicitly unwound. OpenFormula formulae can be converted with regular expressions.
The !cols
array in each worksheet, if present, is a collection of ColInfo
objects which have the following properties:
type ColInfo = {
/* visibility */
hidden?: boolean; // if true, the column is hidden
/* column width is specified in one of the following ways: */
wpx?: number; // width in screen pixels
width?: number; // width in Excel's "Max Digit Width", width*256 is integral
wch?: number; // width in characters
/* other fields for preserving features from files */
level?: number; // 0-indexed outline / group level
MDW?: number; // Excel's "Max Digit Width" unit, always integral
};
Why are there three width types? (click to show)
There are three different width types corresponding to the three different ways spreadsheets store column widths:
SYLK and other plain text formats use raw character count. Contemporaneous tools like Visicalc and Multiplan were character based. Since the characters had the same width, it sufficed to store a count. This tradition was continued into the BIFF formats.
SpreadsheetML (2003) tried to align with HTML by standardizing on screen pixel count throughout the file. Column widths, row heights, and other measures use pixels. When the pixel and character counts do not align, Excel rounds values.
XLSX internally stores column widths in a nebulous "Max Digit Width" form. The Max Digit Width is the width of the largest digit when rendered (generally the "0" character is the widest). The internal width must be an integer multiple of the the width divided by 256. ECMA-376 describes a formula for converting between pixels and the internal width. This represents a hybrid approach.
Read functions attempt to populate all three properties. Write functions will try to cycle specified values to the desired type. In order to avoid potential conflicts, manipulation should delete the other properties first. For example, when changing the pixel width, delete the wch
and width
properties.
Implementation details (click to show)
Given the constraints, it is possible to determine the MDW without actually inspecting the font! The parsers guess the pixel width by converting from width to pixels and back, repeating for all possible MDW and selecting the MDW that minimizes the error. XLML actually stores the pixel width, so the guess works in the opposite direction.
Even though all of the information is made available, writers are expected to follow the priority order:
width
field if availablewpx
pixel width if availablewch
character count if availableThe !rows
array in each worksheet, if present, is a collection of RowInfo
objects which have the following properties:
type RowInfo = {
/* visibility */
hidden?: boolean; // if true, the row is hidden
/* row height is specified in one of the following ways: */
hpx?: number; // height in screen pixels
hpt?: number; // height in points
level?: number; // 0-indexed outline / group level
};
Note: Excel UI displays the base outline level as 1
and the max level as 8
. The level
field stores the base outline as 0
and the max level as 7
.
Implementation details (click to show)
Excel internally stores row heights in points. The default resolution is 72 DPI or 96 PPI, so the pixel and point size should agree. For different resolutions they may not agree, so the library separates the concepts.
Even though all of the information is made available, writers are expected to follow the priority order:
hpx
pixel height if availablehpt
point height if availableThe cell.w
formatted text for each cell is produced from cell.v
and cell.z
format. If the format is not specified, the Excel General
format is used. The format can either be specified as a string or as an index into the format table. Parsers are expected to populate workbook.SSF
with the number format table. Writers are expected to serialize the table.
Custom tools should ensure that the local table has each used format string somewhere in the table. Excel convention mandates that the custom formats start at index 164. The following example creates a custom format from scratch:
New worksheet with custom format (click to show)
var wb = {
SheetNames: ["Sheet1"],
Sheets: {
Sheet1: {
"!ref":"A1:C1",
A1: { t:"n", v:10000 }, // <-- General format
B1: { t:"n", v:10000, z: "0%" }, // <-- Builtin format
C1: { t:"n", v:10000, z: "\"T\"\ #0.00" } // <-- Custom format
}
}
}
The rules are slightly different from how Excel displays custom number formats. In particular, literal characters must be wrapped in double quotes or preceded by a backslash. For more info, see the Excel documentation article Create or delete a custom number format
or ECMA-376 18.8.31 (Number Formats)
Default Number Formats (click to show)
The default formats are listed in ECMA-376 18.8.30:
ID | Format |
---|---|
0 | General |
1 | 0 |
2 | 0.00 |
3 | #,##0 |
4 | #,##0.00 |
9 | 0% |
10 | 0.00% |
11 | 0.00E+00 |
12 | # ?/? |
13 | # ??/?? |
14 | m/d/yy (see below) |
15 | d-mmm-yy |
16 | d-mmm |
17 | mmm-yy |
18 | h:mm AM/PM |
19 | h:mm:ss AM/PM |
20 | h:mm |
21 | h:mm:ss |
22 | m/d/yy h:mm |
37 | #,##0 ;(#,##0) |
38 | #,##0 ;[Red](#,##0) |
39 | #,##0.00;(#,##0.00) |
40 | #,##0.00;[Red](#,##0.00) |
45 | mm:ss |
46 | [h]:mm:ss |
47 | mmss.0 |
48 | ##0.0E+0 |
49 | @ |
Format 14 (m/d/yy
) is localized by Excel: even though the file specifies that number format, it will be drawn differently based on system settings. It makes sense when the producer and consumer of files are in the same locale, but that is not always the case over the Internet. To get around this ambiguity, parse functions accept the dateNF
option to override the interpretation of that specific format string.
Format Support (click to show)
Cell Hyperlinks: XLSX/M, XLSB, BIFF8 XLS, XLML, ODS
Tooltips: XLSX/M, XLSB, BIFF8 XLS, XLML
Hyperlinks are stored in the l
key of cell objects. The Target
field of the hyperlink object is the target of the link, including the URI fragment. Tooltips are stored in the Tooltip
field and are displayed when you move your mouse over the text.
For example, the following snippet creates a link from cell A3
to https://sheetjs.com with the tip "Find us @ SheetJS.com!"
:
ws['A1'].l = { Target:"https://sheetjs.com", Tooltip:"Find us @ SheetJS.com!" };
Note that Excel does not automatically style hyperlinks -- they will generally be displayed as normal text.
Remote Links
HTTP / HTTPS links can be used directly:
ws['A2'].l = { Target:"https://docs.sheetjs.com/#hyperlinks" };
ws['A3'].l = { Target:"http://localhost:7262/yes_localhost_works" };
Excel also supports mailto
email links with subject line:
ws['A4'].l = { Target:"mailto:ignored@dev.null" };
ws['A5'].l = { Target:"mailto:ignored@dev.null?subject=Test Subject" };
Local Links
Links to absolute paths should use the file://
URI scheme:
ws['B1'].l = { Target:"file:///SheetJS/t.xlsx" }; /* Link to /SheetJS/t.xlsx */
ws['B2'].l = { Target:"file:///c:/SheetJS.xlsx" }; /* Link to c:\SheetJS.xlsx */
Links to relative paths can be specified without a scheme:
ws['B3'].l = { Target:"SheetJS.xlsb" }; /* Link to SheetJS.xlsb */
ws['B4'].l = { Target:"../SheetJS.xlsm" }; /* Link to ../SheetJS.xlsm */
Relative Paths have undefined behavior in the SpreadsheetML 2003 format. Excel 2019 will treat a ..\
parent mark as two levels up.
Internal Links
Links where the target is a cell or range or defined name in the same workbook ("Internal Links") are marked with a leading hash character:
ws['C1'].l = { Target:"#E2" }; /* Link to cell E2 */
ws['C2'].l = { Target:"#Sheet2!E2" }; /* Link to cell E2 in sheet Sheet2 */
ws['C3'].l = { Target:"#SomeDefinedName" }; /* Link to Defined Name */
Cell comments are objects stored in the c
array of cell objects. The actual contents of the comment are split into blocks based on the comment author. The a
field of each comment object is the author of the comment and the t
field is the plain text representation.
For example, the following snippet appends a cell comment into cell A1
:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"I'm a little comment, short and stout!"});
Note: XLSB enforces a 54 character limit on the Author name. Names longer than 54 characters may cause issues with other formats.
To mark a comment as normally hidden, set the hidden
property:
if(!ws.A1.c) ws.A1.c = [];
ws.A1.c.push({a:"SheetJS", t:"This comment is visible"});
if(!ws.A2.c) ws.A2.c = [];
ws.A2.c.hidden = true;
ws.A2.c.push({a:"SheetJS", t:"This comment will be hidden"});
Excel enables hiding sheets in the lower tab bar. The sheet data is stored in the file but the UI does not readily make it available. Standard hidden sheets are revealed in the "Unhide" menu. Excel also has "very hidden" sheets which cannot be revealed in the menu. It is only accessible in the VB Editor!
The visibility setting is stored in the Hidden
property of sheet props array.
More details (click to show)
Value | Definition |
---|---|
0 | Visible |
1 | Hidden |
2 | Very Hidden |
With https://rawgit.com/SheetJS/test_files/HEAD/sheet_visibility.xlsx:
> wb.Workbook.Sheets.map(function(x) { return [x.name, x.Hidden] })
[ [ 'Visible', 0 ], [ 'Hidden', 1 ], [ 'VeryHidden', 2 ] ]
Non-Excel formats do not support the Very Hidden state. The best way to test if a sheet is visible is to check if the Hidden
property is logical truth:
> wb.Workbook.Sheets.map(function(x) { return [x.name, !x.Hidden] })
[ [ 'Visible', true ], [ 'Hidden', false ], [ 'VeryHidden', false ] ]
VBA Macros are stored in a special data blob that is exposed in the vbaraw
property of the workbook object when the bookVBA
option is true
. They are supported in XLSM
, XLSB
, and BIFF8 XLS
formats. The supported format writers automatically insert the data blobs if it is present in the workbook and associate with the worksheet names.
Custom Code Names (click to show)
The workbook code name is stored in wb.Workbook.WBProps.CodeName
. By default, Excel will write ThisWorkbook
or a translated phrase like DieseArbeitsmappe
. Worksheet and Chartsheet code names are in the worksheet properties object at wb.Workbook.Sheets[i].CodeName
. Macrosheets and Dialogsheets are ignored.
The readers and writers preserve the code names, but they have to be manually set when adding a VBA blob to a different workbook.
Macrosheets (click to show)
Older versions of Excel also supported a non-VBA "macrosheet" sheet type that stored automation commands. These are exposed in objects with the !type
property set to "macro"
.
Detecting macros in workbooks (click to show)
The vbaraw
field will only be set if macros are present, so testing is simple:
function wb_has_macro(wb/*:workbook*/)/*:boolean*/ {
if(!!wb.vbaraw) return true;
const sheets = wb.SheetNames.map((n) => wb.Sheets[n]);
return sheets.some((ws) => !!ws && ws['!type']=='macro');
}
The exported read
and readFile
functions accept an options argument:
Option Name | Default | Description |
---|---|---|
type | Input data encoding (see Input Type below) | |
raw | false | If true, plain text parsing will not parse values ** |
codepage | If specified, use code page when appropriate ** | |
cellFormula | true | Save formulae to the .f field |
cellHTML | true | Parse rich text and save HTML to the .h field |
cellNF | false | Save number format string to the .z field |
cellStyles | false | Save style/theme info to the .s field |
cellText | true | Generated formatted text to the .w field |
cellDates | false | Store dates as type d (default is n ) |
dateNF | If specified, use the string for date code 14 ** | |
sheetStubs | false | Create cell objects of type z for stub cells |
sheetRows | 0 | If >0, read the first sheetRows rows ** |
bookDeps | false | If true, parse calculation chains |
bookFiles | false | If true, add raw files to book object ** |
bookProps | false | If true, only parse enough to get book metadata ** |
bookSheets | false | If true, only parse enough to get the sheet names |
bookVBA | false | If true, copy VBA blob to vbaraw field ** |
password | "" | If defined and file is encrypted, use password ** |
WTF | false | If true, throw errors on unexpected file features ** |
sheets | If specified, only parse specified sheets ** | |
PRN | false | If true, allow parsing of PRN files ** |
xlfn | false | If true, preserve _xlfn. prefixes in formulae ** |
FS | DSV Field Separator override |
cellNF
is false, formatted text will be generated and saved to .w
bookSheets
is false.raw
option suppresses value parsing.bookSheets
and bookProps
combine to give both sets of informationDeps
will be an empty object if bookDeps
is falsebookFiles
behavior depends on file type:keys
array (paths in the ZIP) for ZIP-based formatsfiles
hash (mapping paths to objects representing the files) for ZIPcfb
object for formats using CFB containerssheetRows-1
rows will be generated when looking at the JSON object output (since the header row is counted as a row when parsing the data)sheets
restricts based on input type:0
is first worksheet)bookVBA
merely exposes the raw VBA CFB object. It does not parse the data. XLSM and XLSB store the VBA CFB object in xl/vbaProject.bin
. BIFF8 XLS mixes the VBA entries alongside the core Workbook entry, so the library generates a new XLSB-compatible blob from the XLS CFB container.codepage
is applied to BIFF2 - BIFF5 files without CodePage
records and to CSV files without BOM in type:"binary"
. BIFF8 XLS always defaults to 1200.PRN
affects parsing of text files without a common delimiter character._xlfn.
prefix, hidden from the user. SheetJS will strip _xlfn.
normally. The xlfn
option preserves them.WTF:true
forces those errors to be thrown.Strings can be interpreted in multiple ways. The type
parameter for read
tells the library how to parse the data argument:
type | expected input |
---|---|
"base64" | string: Base64 encoding of the file |
"binary" | string: binary string (byte n is data.charCodeAt(n) ) |
"string" | string: JS string (characters interpreted as UTF8) |
"buffer" | nodejs Buffer |
"array" | array: array of 8-bit unsigned int (byte n is data[n] ) |
"file" | string: path of file that will be read (nodejs only) |
Implementation Details (click to show)
Excel and other spreadsheet tools read the first few bytes and apply other heuristics to determine a file type. This enables file type punning: renaming files with the .xls
extension will tell your computer to use Excel to open the file but Excel will know how to handle it. This library applies similar logic:
Byte 0 | Raw File Type | Spreadsheet Types |
---|---|---|
0xD0 | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR |
0x09 | BIFF Stream | BIFF 2/3/4/5 |
0x3C | XML/HTML | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x50 | ZIP Archive | XLSB or XLSX/M or ODS or UOS2 or plain text |
0x49 | Plain Text | SYLK or plain text |
0x54 | Plain Text | DIF or plain text |
0xEF | UTF8 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0xFF | UTF16 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x00 | Record Stream | Lotus WK* or Quattro Pro or plain text |
0x7B | Plain text | RTF or plain text |
0x0A | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x0D | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x20 | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
DBF files are detected based on the first byte as well as the third and fourth bytes (corresponding to month and day of the file date)
Works for Windows files are detected based on the BOF record with type 0xFF
Plain text format guessing follows the priority order:
Format | Test |
---|---|
XML | <?xml appears in the first 1024 characters |
HTML | starts with < and HTML tags appear in the first 1024 characters * |
XML | starts with < and the first tag is valid |
RTF | starts with {\rt |
DSV | starts with /sep=.$/ , separator is the specified character |
DSV | more unquoted ` |
DSV | more unquoted ; chars than \t or , in the first 1024 |
TSV | more unquoted \t chars than , chars in the first 1024 |
CSV | one of the first 1024 characters is a comma "," |
ETH | starts with socialcalc:version: |
PRN | PRN option is set to true |
CSV | (fallback) |
html
, table
, head
, meta
, script
, style
, div
Why are random text files valid? (click to show)
Excel is extremely aggressive in reading files. Adding an XLS extension to any display text file (where the only characters are ANSI display chars) tricks Excel into thinking that the file is potentially a CSV or TSV file, even if it is only one column! This library attempts to replicate that behavior.
The best approach is to validate the desired worksheet and ensure it has the expected number of rows or columns. Extracting the range is extremely simple:
var range = XLSX.utils.decode_range(worksheet['!ref']);
var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1;
The exported write
and writeFile
functions accept an options argument:
Option Name | Default | Description |
---|---|---|
type | Output data encoding (see Output Type below) | |
cellDates | false | Store dates as type d (default is n ) |
bookSST | false | Generate Shared String Table ** |
bookType | "xlsx" | Type of Workbook (see below for supported formats) |
sheet | "" | Name of Worksheet for single-sheet formats ** |
compression | false | Use ZIP compression for ZIP-based formats ** |
Props | Override workbook properties when writing ** | |
themeXLSX | Override theme XML when writing XLSX/XLSB/XLSM ** | |
ignoreEC | true | Suppress "number as text" errors ** |
bookSST
is slower and more memory intensive, but has better compatibility with older versions of iOS NumberscellDates
only applies to XLSX output and is not guaranteed to work with third-party readers. Excel itself does not usually write cells with type d
so non-Excel tools may ignore the data or error in the presence of dates.Props
is an object mirroring the workbook Props
field. See the table from the Workbook File Properties section.themeXLSX
will be saved as the primary theme for XLSX/XLSB/XLSM files (to xl/theme/theme1.xml
in the ZIP)ignoreEC
to false
to suppress.For broad compatibility with third-party tools, this library supports many output formats. The specific file type is controlled with bookType
option:
bookType | file ext | container | sheets | Description |
---|---|---|---|---|
xlsx | .xlsx | ZIP | multi | Excel 2007+ XML Format |
xlsm | .xlsm | ZIP | multi | Excel 2007+ Macro XML Format |
xlsb | .xlsb | ZIP | multi | Excel 2007+ Binary Format |
biff8 | .xls | CFB | multi | Excel 97-2004 Workbook Format |
biff5 | .xls | CFB | multi | Excel 5.0/95 Workbook Format |
biff4 | .xls | none | single | Excel 4.0 Worksheet Format |
biff3 | .xls | none | single | Excel 3.0 Worksheet Format |
biff2 | .xls | none | single | Excel 2.0 Worksheet Format |
xlml | .xls | none | multi | Excel 2003-2004 (SpreadsheetML) |
ods | .ods | ZIP | multi | OpenDocument Spreadsheet |
fods | .fods | none | multi | Flat OpenDocument Spreadsheet |
wk3 | .wk3 | none | single | Lotus Workbook (WK3) |
csv | .csv | none | single | Comma Separated Values |
txt | .txt | none | single | UTF-16 Unicode Text (TXT) |
sylk | .sylk | none | single | Symbolic Link (SYLK) |
html | .html | none | single | HTML Document |
dif | .dif | none | single | Data Interchange Format (DIF) |
dbf | .dbf | none | single | dBASE II + VFP Extensions (DBF) |
wk1 | .wk1 | none | single | Lotus Worksheet (WK1) |
rtf | .rtf | none | single | Rich Text Format (RTF) |
prn | .prn | none | single | Lotus Formatted Text |
eth | .eth | none | single | Ethercalc Record Format (ETH) |
compression
only applies to formats with ZIP containers.sheet
option specifying the worksheet. If the string is empty, the first worksheet is used.writeFile
will automatically guess the output file format based on the file extension if bookType
is not specified. It will choose the first format in the aforementioned table that matches the extension.The type
argument for write
mirrors the type
argument for read
:
type | output |
---|---|
"base64" | string: Base64 encoding of the file |
"binary" | string: binary string (byte n is data.charCodeAt(n) ) |
"string" | string: JS string (characters interpreted as UTF8) |
"buffer" | nodejs Buffer |
"array" | ArrayBuffer, fallback array of 8-bit unsigned int |
"file" | string: path of file that will be created (nodejs only) |
The sheet_to_*
functions accept a worksheet and an optional options object.
The *_to_sheet
functions accept a data object and an optional options object.
The examples are based on the following worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
3 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
XLSX.utils.aoa_to_sheet
takes an array of arrays of JS values and returns a worksheet resembling the input data. Numbers, Booleans and Strings are stored as the corresponding styles. Dates are stored as date or numbers. Array holes and explicit undefined
values are skipped. null
values may be stubbed. All other values are stored as strings. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetStubs | false | Create cell objects of type z for null values |
nullError | false | If true, emit #NULL! error cells for null values |
Examples (click to show)
To generate the example sheet:
var ws = XLSX.utils.aoa_to_sheet([
"SheetJS".split(""),
[1,2,3,4,5,6,7],
[2,3,4,5,6,7,8]
]);
XLSX.utils.sheet_add_aoa
takes an array of arrays of JS values and updates an existing worksheet object. It follows the same process as aoa_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetStubs | false | Create cell objects of type z for null values |
nullError | false | If true, emit #NULL! error cells for null values |
origin | Use specified cell as starting point (see below) |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
Consider the worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | | | 5 | 6 | 7 |
3 | 2 | 3 | | | 6 | 7 | 8 |
4 | 3 | 4 | | | 7 | 8 | 9 |
5 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
This worksheet can be built up in the order A1:G1, A2:B4, E2:G4, A5:G5
:
/* Initial row */
var ws = XLSX.utils.aoa_to_sheet([ "SheetJS".split("") ]);
/* Write data starting at A2 */
XLSX.utils.sheet_add_aoa(ws, [[1,2], [2,3], [3,4]], {origin: "A2"});
/* Write data starting at E2 */
XLSX.utils.sheet_add_aoa(ws, [[5,6,7], [6,7,8], [7,8,9]], {origin:{r:1, c:4}});
/* Append row */
XLSX.utils.sheet_add_aoa(ws, [[4,5,6,7,8,9,0]], {origin: -1});
XLSX.utils.json_to_sheet
takes an array of objects and returns a worksheet with automatically-generated "headers" based on the keys of the objects. The default column order is determined by the first appearance of the field using Object.keys
. The function accepts an options argument:
Option Name | Default | Description |
---|---|---|
header | Use specified field order (default Object.keys ) ** | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
skipHeader | false | If true, do not include header row in output |
nullError | false | If true, emit #NULL! error cells for null values |
header
is an array and it does not contain a particular field, the key will be appended to the array.Date
object will generate a Date cell, while a string will generate a Text cell.nullError
is true, an error cell corresponding to #NULL!
will be written to the worksheet.Examples (click to show)
The original sheet cannot be reproduced using plain objects since JS object keys must be unique. After replacing the second e
and S
with e_1
and S_1
:
var ws = XLSX.utils.json_to_sheet([
{ S:1, h:2, e:3, e_1:4, t:5, J:6, S_1:7 },
{ S:2, h:3, e:4, e_1:5, t:6, J:7, S_1:8 }
], {header:["S","h","e","e_1","t","J","S_1"]});
Alternatively, the header row can be skipped:
var ws = XLSX.utils.json_to_sheet([
{ A:"S", B:"h", C:"e", D:"e", E:"t", F:"J", G:"S" },
{ A: 1, B: 2, C: 3, D: 4, E: 5, F: 6, G: 7 },
{ A: 2, B: 3, C: 4, D: 5, E: 6, F: 7, G: 8 }
], {header:["A","B","C","D","E","F","G"], skipHeader:true});
XLSX.utils.sheet_add_json
takes an array of objects and updates an existing worksheet object. It follows the same process as json_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
header | Use specified column order (default Object.keys ) | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
skipHeader | false | If true, do not include header row in output |
nullError | false | If true, emit #NULL! error cells for null values |
origin | Use specified cell as starting point (see below) |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
Consider the worksheet:
XXX| A | B | C | D | E | F | G |
---+---+---+---+---+---+---+---+
1 | S | h | e | e | t | J | S |
2 | 1 | 2 | | | 5 | 6 | 7 |
3 | 2 | 3 | | | 6 | 7 | 8 |
4 | 3 | 4 | | | 7 | 8 | 9 |
5 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
This worksheet can be built up in the order A1:G1, A2:B4, E2:G4, A5:G5
:
/* Initial row */
var ws = XLSX.utils.json_to_sheet([
{ A: "S", B: "h", C: "e", D: "e", E: "t", F: "J", G: "S" }
], {header: ["A", "B", "C", "D", "E", "F", "G"], skipHeader: true});
/* Write data starting at A2 */
XLSX.utils.sheet_add_json(ws, [
{ A: 1, B: 2 }, { A: 2, B: 3 }, { A: 3, B: 4 }
], {skipHeader: true, origin: "A2"});
/* Write data starting at E2 */
XLSX.utils.sheet_add_json(ws, [
{ A: 5, B: 6, C: 7 }, { A: 6, B: 7, C: 8 }, { A: 7, B: 8, C: 9 }
], {skipHeader: true, origin: { r: 1, c: 4 }, header: [ "A", "B", "C" ]});
/* Append row */
XLSX.utils.sheet_add_json(ws, [
{ A: 4, B: 5, C: 6, D: 7, E: 8, F: 9, G: 0 }
], {header: ["A", "B", "C", "D", "E", "F", "G"], skipHeader: true, origin: -1});
XLSX.utils.table_to_sheet
takes a table DOM element and returns a worksheet resembling the input table. Numbers are parsed. All other data will be stored as strings.
XLSX.utils.table_to_book
produces a minimal workbook based on the worksheet.
Both functions accept options arguments:
Option Name | Default | Description |
---|---|---|
raw | If true, every cell will hold raw strings | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetRows | 0 | If >0, read the first sheetRows rows of the table |
display | false | If true, hidden rows and cells will not be parsed |
Examples (click to show)
To generate the example sheet, start with the HTML table:
<table id="sheetjs">
<tr><td>S</td><td>h</td><td>e</td><td>e</td><td>t</td><td>J</td><td>S</td></tr>
<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td></tr>
<tr><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td></tr>
</table>
To process the table:
var tbl = document.getElementById('sheetjs');
var wb = XLSX.utils.table_to_book(tbl);
Note: XLSX.read
can handle HTML represented as strings.
XLSX.utils.sheet_add_dom
takes a table DOM element and updates an existing worksheet object. It follows the same process as table_to_sheet
and accepts an options argument:
Option Name | Default | Description |
---|---|---|
raw | If true, every cell will hold raw strings | |
dateNF | FMT 14 | Use specified date format in string output |
cellDates | false | Store dates as type d (default is n ) |
sheetRows | 0 | If >0, read the first sheetRows rows of the table |
display | false | If true, hidden rows and cells will not be parsed |
origin
is expected to be one of:
origin | Description |
---|---|
(cell object) | Use specified cell (cell object) |
(string) | Use specified cell (A1-style cell) |
(number >= 0) | Start from the first column at specified row (0-indexed) |
-1 | Append to bottom of worksheet starting on first column |
(default) | Start from cell A1 |
Examples (click to show)
A small helper function can create gap rows between tables:
function create_gap_rows(ws, nrows) {
var ref = XLSX.utils.decode_range(ws["!ref"]); // get original range
ref.e.r += nrows; // add to ending row
ws["!ref"] = XLSX.utils.encode_range(ref); // reassign row
}
/* first table */
var ws = XLSX.utils.table_to_sheet(document.getElementById('table1'));
create_gap_rows(ws, 1); // one row gap after first table
/* second table */
XLSX.utils.sheet_add_dom(ws, document.getElementById('table2'), {origin: -1});
create_gap_rows(ws, 3); // three rows gap after second table
/* third table */
XLSX.utils.sheet_add_dom(ws, document.getElementById('table3'), {origin: -1});
XLSX.utils.sheet_to_formulae
generates an array of commands that represent how a person would enter data into an application. Each entry is of the form A1-cell-address=formula-or-value
. String literals are prefixed with a '
in accordance with Excel.
Examples (click to show)
For the example sheet:
> var o = XLSX.utils.sheet_to_formulae(ws);
> [o[0], o[5], o[10], o[15], o[20]];
[ 'A1=\'S', 'F1=\'J', 'D2=4', 'B3=3', 'G3=8' ]
As an alternative to the writeFile
CSV type, XLSX.utils.sheet_to_csv
also produces CSV output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
FS | "," | "Field Separator" delimiter between fields |
RS | "\n" | "Record Separator" delimiter between rows |
dateNF | FMT 14 | Use specified date format in string output |
strip | false | Remove trailing field separators in each record ** |
blankrows | true | Include blank lines in the CSV output |
skipHidden | false | Skips hidden rows/columns in the CSV output |
forceQuotes | false | Force quotes around fields |
strip
will remove trailing commas from each line under default FS/RS
blankrows
must be set to false
to skip blank lines.forceQuotes
forces all cells to be wrapped in quotes.Examples (click to show)
For the example sheet:
> console.log(XLSX.utils.sheet_to_csv(ws));
S,h,e,e,t,J,S
1,2,3,4,5,6,7
2,3,4,5,6,7,8
> console.log(XLSX.utils.sheet_to_csv(ws, {FS:"\t"}));
S h e e t J S
1 2 3 4 5 6 7
2 3 4 5 6 7 8
> console.log(XLSX.utils.sheet_to_csv(ws,{FS:":",RS:"|"}));
S:h:e:e:t:J:S|1:2:3:4:5:6:7|2:3:4:5:6:7:8|
The txt
output type uses the tab character as the field separator. If the codepage
library is available (included in full distribution but not core), the output will be encoded in CP1200
and the BOM will be prepended.
XLSX.utils.sheet_to_txt
takes the same arguments as sheet_to_csv
.
As an alternative to the writeFile
HTML type, XLSX.utils.sheet_to_html
also produces HTML output. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
id | Specify the id attribute for the TABLE element | |
editable | false | If true, set contenteditable="true" for every TD |
header | Override header (default html body ) | |
footer | Override footer (default /body /html ) |
Examples (click to show)
For the example sheet:
> console.log(XLSX.utils.sheet_to_html(ws));
// ...
XLSX.utils.sheet_to_json
generates different types of JS objects. The function takes an options argument:
Option Name | Default | Description |
---|---|---|
raw | true | Use raw values (true) or formatted strings (false) |
range | from WS | Override Range (see table below) |
header | Control output format (see table below) | |
dateNF | FMT 14 | Use specified date format in string output |
defval | Use specified value in place of null or undefined | |
blankrows | ** | Include blank lines in the output ** |
raw
only affects cells which have a format code (.z
) field or a formatted text (.w
) field.header
is specified, the first row is considered a data row; if header
is not specified, the first row is the header row and not considered data.header
is not specified, the conversion will automatically disambiguate header entries by affixing _
and a count starting at 1
. For example, if three columns have header foo
the output fields are foo
, foo_1
, foo_2
null
values are returned when raw
is true but are skipped when false.defval
is not specified, null and undefined values are skipped normally. If specified, all null and undefined points will be filled with defval
header
is 1
, the default is to generate blank rows. blankrows
must be set to false
to skip blank rows.header
is not 1
, the default is to skip blank rows. blankrows
must be true to generate blank rowsrange
is expected to be one of:
range | Description |
---|---|
(number) | Use worksheet range but set starting row to the value |
(string) | Use specified range (A1-style bounded range string) |
(default) | Use worksheet range (ws['!ref'] ) |
header
is expected to be one of:
header | Description |
---|---|
1 | Generate an array of arrays ("2D Array") |
"A" | Row object keys are literal column labels |
array of strings | Use specified strings as keys in row objects |
(default) | Read and disambiguate first row as keys |
If header is not 1
, the row object will contain the non-enumerable property __rowNum__
that represents the row of the sheet corresponding to the entry.
Examples (click to show)
For the example sheet:
> XLSX.utils.sheet_to_json(ws);
[ { S: 1, h: 2, e: 3, e_1: 4, t: 5, J: 6, S_1: 7 },
{ S: 2, h: 3, e: 4, e_1: 5, t: 6, J: 7, S_1: 8 } ]
> XLSX.utils.sheet_to_json(ws, {header:"A"});
[ { A: 'S', B: 'h', C: 'e', D: 'e', E: 't', F: 'J', G: 'S' },
{ A: '1', B: '2', C: '3', D: '4', E: '5', F: '6', G: '7' },
{ A: '2', B: '3', C: '4', D: '5', E: '6', F: '7', G: '8' } ]
> XLSX.utils.sheet_to_json(ws, {header:["A","E","I","O","U","6","9"]});
[ { '6': 'J', '9': 'S', A: 'S', E: 'h', I: 'e', O: 'e', U: 't' },
{ '6': '6', '9': '7', A: '1', E: '2', I: '3', O: '4', U: '5' },
{ '6': '7', '9': '8', A: '2', E: '3', I: '4', O: '5', U: '6' } ]
> XLSX.utils.sheet_to_json(ws, {header:1});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ '1', '2', '3', '4', '5', '6', '7' ],
[ '2', '3', '4', '5', '6', '7', '8' ] ]
Example showing the effect of raw
:
> ws['A2'].w = "3"; // set A2 formatted string value
> XLSX.utils.sheet_to_json(ws, {header:1, raw:false});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ '3', '2', '3', '4', '5', '6', '7' ], // <-- A2 uses the formatted string
[ '2', '3', '4', '5', '6', '7', '8' ] ]
> XLSX.utils.sheet_to_json(ws, {header:1});
[ [ 'S', 'h', 'e', 'e', 't', 'J', 'S' ],
[ 1, 2, 3, 4, 5, 6, 7 ], // <-- A2 uses the raw value
[ 2, 3, 4, 5, 6, 7, 8 ] ]
Despite the library name xlsx
, it supports numerous spreadsheet file formats:
Format | Read | Write |
---|---|---|
Excel Worksheet/Workbook Formats | :-----: | :-----: |
Excel 2007+ XML Formats (XLSX/XLSM) | ✔ | ✔ |
Excel 2007+ Binary Format (XLSB BIFF12) | ✔ | ✔ |
Excel 2003-2004 XML Format (XML "SpreadsheetML") | ✔ | ✔ |
Excel 97-2004 (XLS BIFF8) | ✔ | ✔ |
Excel 5.0/95 (XLS BIFF5) | ✔ | ✔ |
Excel 4.0 (XLS/XLW BIFF4) | ✔ | ✔ |
Excel 3.0 (XLS BIFF3) | ✔ | ✔ |
Excel 2.0/2.1 (XLS BIFF2) | ✔ | ✔ |
Excel Supported Text Formats | :-----: | :-----: |
Delimiter-Separated Values (CSV/TXT) | ✔ | ✔ |
Data Interchange Format (DIF) | ✔ | ✔ |
Symbolic Link (SYLK/SLK) | ✔ | ✔ |
Lotus Formatted Text (PRN) | ✔ | ✔ |
UTF-16 Unicode Text (TXT) | ✔ | ✔ |
Other Workbook/Worksheet Formats | :-----: | :-----: |
OpenDocument Spreadsheet (ODS) | ✔ | ✔ |
Flat XML ODF Spreadsheet (FODS) | ✔ | ✔ |
Uniform Office Format Spreadsheet (标文通 UOS1/UOS2) | ✔ | |
dBASE II/III/IV / Visual FoxPro (DBF) | ✔ | ✔ |
Lotus 1-2-3 (WK1/WK3) | ✔ | ✔ |
Lotus 1-2-3 (WKS/WK2/WK4/123) | ✔ | |
Quattro Pro Spreadsheet (WQ1/WQ2/WB1/WB2/WB3/QPW) | ✔ | |
Works 1.x-3.x DOS / 2.x-5.x Windows Spreadsheet (WKS) | ✔ | |
Works 6.x-9.x Spreadsheet (XLR) | ✔ | |
Other Common Spreadsheet Output Formats | :-----: | :-----: |
HTML Tables | ✔ | ✔ |
Rich Text Format tables (RTF) | ✔ | |
Ethercalc Record Format (ETH) | ✔ | ✔ |
Features not supported by a given file format will not be written. Formats with range limits will be silently truncated:
Format | Last Cell | Max Cols | Max Rows |
---|---|---|---|
Excel 2007+ XML Formats (XLSX/XLSM) | XFD1048576 | 16384 | 1048576 |
Excel 2007+ Binary Format (XLSB BIFF12) | XFD1048576 | 16384 | 1048576 |
Excel 97-2004 (XLS BIFF8) | IV65536 | 256 | 65536 |
Excel 5.0/95 (XLS BIFF5) | IV16384 | 256 | 16384 |
Excel 4.0 (XLS BIFF4) | IV16384 | 256 | 16384 |
Excel 3.0 (XLS BIFF3) | IV16384 | 256 | 16384 |
Excel 2.0/2.1 (XLS BIFF2) | IV16384 | 256 | 16384 |
Lotus 1-2-3 R2 - R5 (WK1/WK3/WK4) | IV8192 | 256 | 8192 |
Lotus 1-2-3 R1 (WKS) | IV2048 | 256 | 2048 |
Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.
(click to show)
XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). The XLSM format, almost identical to XLSX, is used for files containing macros.
The format is standardized in ECMA-376 and later in ISO/IEC 29500. Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.
(click to show)
BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced the concept of a workbook (XLW
files) but also had single-sheet XLS
format. The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12 extended the format in various ways but largely stuck to the same record format.
There is no official specification for any of these formats. Excel 95 can write files in these formats, so record lengths and fields were determined by writing in all of the supported formats and comparing files. Excel 2016 can generate BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.
(click to show)
BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. At its core, it still uses an extended version of the binary record format from older versions of BIFF.
The MS-XLS
specification covers the basics of the file format, and other specifications expand on serialization of features like properties.
(click to show)
Predating XLSX, SpreadsheetML files are simple XML files. There is no official and comprehensive specification, although MS has released documentation on the format. Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.
(click to show)
Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
The MS-XLSB
specification covers the basics of the file format, and other specifications expand on serialization of features like properties.
(click to show)
Excel CSV deviates from RFC4180 in a number of important ways. The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. The parser should generally understand Excel CSV. The writer proactively generates cells for formulae if values are unavailable.
Excel TXT uses tab as the delimiter and code page 1200.
Notes:
0x49 0x44 ("ID")
are treated as Symbolic Link files. Unlike Excel, if the file does not have a valid SYLK header, it will be proactively reinterpreted as CSV. There are some files with semicolon delimiter that align with a valid SYLK file. For the broadest compatibility, all cells with the value of ID
are automatically wrapped in double-quotes.(click to show)
Support for other formats is generally far XLS/XLSB/XLSX support, due in large part to a lack of publicly available documentation. Test files were produced in the respective apps and compared to their XLS exports to determine structure. The main focus is data extraction.
(click to show)
The Lotus formats consist of binary records similar to the BIFF structure. Lotus did release a specification decades ago covering the original WK1 format. Other features were deduced by producing files and comparing to Excel support.
Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.
Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.
(click to show)
The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.
(click to show)
All versions of Works were limited to a single worksheet.
Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.
Works for Windows 3.x - 5.x uses the same format and WKS extension. The BOF record has type FF
Works for Windows 6.x - 9.x use the XLR format. XLR is nearly identical to BIFF8 XLS: it uses the CFB container with a Workbook stream. Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. XLR also includes a WksSSWorkBook
stream similar to Lotus FM3/FMT files.
(click to show)
ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. Both are detailed in the OASIS standard, but tools like LO/OO add undocumented extensions. The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.
(click to show)
UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. For the most part, the difference between the formats is in the names of tags and attributes.
Many older formats supported only one worksheet:
(click to show)
DBF is really a typed table format: each column can only hold one data type and each record omits type information. The parser generates a header row and inserts records starting at the second row of the worksheet. The writer makes files compatible with Visual FoxPro extensions.
Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. The reader understands DBF Level 7 extensions like DATETIME.
(click to show)
There is no real documentation. All knowledge was gathered by saving files in various versions of Excel to deduce the meaning of fields. Notes:
(click to show)
There is no real documentation, and in fact Excel treats PRN as an output-only file format. Nevertheless we can guess the column widths and reverse-engineer the original layout. Excel's 240 character width limitation is not enforced.
(click to show)
There is no unified definition. Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Where ambiguous, the parser/writer follows the expected behavior from Excel. In particular, Excel extends DIF in incompatible ways:
"0.3" -> "=""0.3""
(click to show)
Excel HTML worksheets include special metadata encoded in styles. For example, mso-number-format
is a localized string containing the number format. Despite the metadata the output is valid HTML, although it does accept bare &
symbols.
The writer adds type metadata to the TD elements via the t
tag. The parser looks for those tags and overrides the default interpretation. For example, text like <td>12345</td>
will be parsed as numbers but <td t="s">12345</td>
will be parsed as text.
(click to show)
Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. The supported codes are a subset of the Word RTF support.
(click to show)
Ethercalc is an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.
(click to show)
make test
will run the node-based tests. By default it runs tests on files in every supported format. To test a specific file type, set FMTS
to the format you want to test. Feature-specific tests are available with make test_misc
$ make test_misc # run core tests
$ make test # run full tests
$ make test_xls # only use the XLS test files
$ make test_xlsx # only use the XLSX test files
$ make test_xlsb # only use the XLSB test files
$ make test_xml # only use the XML test files
$ make test_ods # only use the ODS test files
To enable all errors, set the environment variable WTF=1
:
$ make test # run full tests
$ WTF=1 make test # enable all error messages
flow
and eslint
checks are available:
$ make lint # eslint checks
$ make flow # make lint + Flow checking
$ make tslint # check TS definitions
(click to show)
The core in-browser tests are available at tests/index.html
within this repo. Start a local server and navigate to that directory to run the tests. make ctestserv
will start a server on port 8000.
make ctest
will generate the browser fixtures. To add more files, edit the tests/fixtures.lst
file and add the paths.
To run the full in-browser tests, clone the repo for oss.sheetjs.com
and replace the xlsx.js
file (then open a browser window and go to stress.html
):
$ cp xlsx.js ../SheetJS.github.io
$ cd ../SheetJS.github.io
$ simplehttpserver # or "python -mSimpleHTTPServer" or "serve"
$ open -a Chromium.app http://localhost:8000/stress.html
(click to show)
0.8
, 0.10
, 0.12
, 4.x
, 5.x
, 6.x
, 7.x
, 8.x
Tests utilize the mocha testing framework.
The test suite also includes tests for various time zones. To change the timezone locally, set the TZ environment variable:
$ env TZ="Asia/Kolkata" WTF=1 make test_misc
Test files are housed in another repo.
Running make init
will refresh the test_files
submodule and get the files. Note that this requires svn
, git
, hg
and other commands that may not be available. If make init
fails, please download the latest version of the test files snapshot from the repo
Latest Snapshot (click to show)
Latest test files snapshot: http://github.com/SheetJS/test_files/releases/download/20170409/test_files.zip
(download and unzip to the test_files
subdirectory)
Due to the precarious nature of the Open Specifications Promise, it is very important to ensure code is cleanroom. Contribution Notes
File organization (click to show)
At a high level, the final script is a concatenation of the individual files in the bits
folder. Running make
should reproduce the final output on all platforms. The README is similarly split into bits in the docbits
folder.
Folders:
folder | contents |
---|---|
bits | raw source files that make up the final script |
docbits | raw markdown files that make up README.md |
bin | server-side bin scripts (xlsx.njs ) |
dist | dist files for web browsers and nonstandard JS environments |
demos | demo projects for platforms like ExtendScript and Webpack |
tests | browser tests (run make ctest to rebuild) |
types | typescript definitions and tests |
misc | miscellaneous supporting scripts |
test_files | test files (pulled from the test files repository) |
After cloning the repo, running make help
will display a list of commands.
(click to show)
The xlsx.js
file is constructed from the files in the bits
subdirectory. The build script (run make
) will concatenate the individual bits to produce the script. Before submitting a contribution, ensure that running make will produce the xlsx.js
file exactly. The simplest way to test is to add the script:
$ git add xlsx.js
$ make clean
$ make
$ git diff xlsx.js
To produce the dist files, run make dist
. The dist files are updated in each version release and should not be committed between versions.
(click to show)
The included make.cmd
script will build xlsx.js
from the bits
directory. Building is as simple as:
> make
To prepare development environment:
> make init
The full list of commands available in Windows are displayed in make help
:
make init -- install deps and global modules
make lint -- run eslint linter
make test -- run mocha test suite
make misc -- run smaller test suite
make book -- rebuild README and summary
make help -- display this message
As explained in Test Files, on Windows the release ZIP file must be downloaded and extracted. If Bash on Windows is available, it is possible to run the OSX/Linux workflow. The following steps prepares the environment:
# Install support programs for the build and test commands
sudo apt-get install make git subversion mercurial
# Install nodejs and NPM within the WSL
wget -qO- https://deb.nodesource.com/setup_8.x | sudo bash
sudo apt-get install nodejs
# Install dev dependencies
sudo npm install -g mocha voc blanket xlsjs
(click to show)
The test_misc
target (make test_misc
on Linux/OSX / make misc
on Windows) runs the targeted feature tests. It should take 5-10 seconds to perform feature tests without testing against the entire test battery. New features should be accompanied with tests for the relevant file formats and features.
For tests involving the read side, an appropriate feature test would involve reading an existing file and checking the resulting workbook object. If a parameter is involved, files should be read with different values to verify that the feature is working as expected.
For tests involving a new write feature which can already be parsed, appropriate feature tests would involve writing a workbook with the feature and then opening and verifying that the feature is preserved.
For tests involving a new write feature without an existing read ability, please add a feature test to the kitchen sink tests/write.js
.
Please consult the attached LICENSE file for details. All rights not explicitly granted by the Apache 2.0 License are reserved by the Original Author.
OSP-covered Specifications (click to show)
MS-CFB
: Compound File Binary File FormatMS-CTXLS
: Excel Custom Toolbar Binary File FormatMS-EXSPXML3
: Excel Calculation Version 2 Web Service XML SchemaMS-ODATA
: Open Data Protocol (OData)MS-ODRAW
: Office Drawing Binary File FormatMS-ODRAWXML
: Office Drawing Extensions to Office Open XML StructureMS-OE376
: Office Implementation Information for ECMA-376 Standards SupportMS-OFFCRYPTO
: Office Document Cryptography StructureMS-OI29500
: Office Implementation Information for ISO/IEC 29500 Standards SupportMS-OLEDS
: Object Linking and Embedding (OLE) Data StructuresMS-OLEPS
: Object Linking and Embedding (OLE) Property Set Data StructuresMS-OODF3
: Office Implementation Information for ODF 1.2 Standards SupportMS-OSHARED
: Office Common Data Types and Objects StructuresMS-OVBA
: Office VBA File Format StructureMS-XLDM
: Spreadsheet Data Model File FormatMS-XLS
: Excel Binary File Format (.xls) Structure SpecificationMS-XLSB
: Excel (.xlsb) Binary File FormatMS-XLSX
: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File FormatXLS
: Microsoft Office Excel 97-2007 Binary File Format SpecificationRTF
: Rich Text FormatDownload Details:
Author: SheetJS
Source Code: https://github.com/SheetJS/sheetjs
License: Apache-2.0 License
1684207573
Different types of files are used in Bash for different purposes. Many options are available in Bash to check if the particular file exists or not. The existence of the file can be checked using the file test operators with the “test” command or without the “test” command. The purposes of different types of file test operators to check the existence of the file are shown in this tutorial.
Many file test operators exist in Bash to check if a particular file exists or not. Some of them are mentioned in the following:
Operator | Purpose |
-f | It is used to check if the file exists and if it is a regular file. |
-d | It is used to check if the file exists as a directory. |
-e | It is used to check the existence of the file only. |
-h or -L | It is used to check if the file exists as a symbolic link. |
-r | It is used to check if the file exists as a readable file. |
-w | It is used to check if the file exists as a writable file. |
-x | It is used to check if the file exists as an executable file. |
-s | It is used to check if the file exists and if the file is nonzero. |
-b | It is used to check if the file exists as a block special file. |
-c | It is used to check if the file exists as a special character file. |
Many ways of checking the existence of the regular file are shown in this part of the tutorial.
Create a Bash file with the following script that takes the filename from the user and check whether the file exists in the current location or not using the -f operator in the “if” condition with the single third brackets ([]).
#!/bin/bash
#Take the filename
echo -n "Enter the filename: "
read filename
#Check whether the file exists or not using the -f operator
if [ -f "$filename" ]; then
echo "File exists."
else
echo "File does not exist."
fi
The script is executed twice in the following script. The non-existence filename is given in the first execution. The existing filename is given in the second execution. The “ls” command is executed to check whether the file exists or not.
Create a Bash file with the following script that takes the filename as a command-line argument and check whether the file exists in the current location or not using the -f operator in the “if” condition with the double third brackets ([[ ]]).
#!/bin/bash
#Take the filename from the command-line argument
filename=$1
#Check whether the argument is missing or not
if [ "$filename" != "" ]; then
#Check whether the file exists or not using the -f operator
if [[ -f "$filename" ]]; then
echo "File exists."
else
echo "File does not exist."
fi
else
echo "Argument is missing."
fi
The script is executed twice in the following script. No argument is given in the first execution. An existing filename is given as an argument in the second execution. The “ls” command is executed to check whether the file exists or not.
Create a Bash file with the following script that takes the filename as a command-line argument and check whether the file exists in the current location or not using the -f operator with the “test” command in the “if” condition.
#!/bin/bash
#Take the filename from the command-line argument
filename=$1
#Check whether the argument is missing or not
if [ $# -lt 1 ]; then
echo "No argument is given."
exit 1
fi
#Check whether the file exists or not using the -f operator
if test -f "$filename"; then
echo "File exists."
else
echo "File does not exist."
fi
The script is executed twice in the following script. No argument is given in the first execution. An existing filename is given in the second execution.
Create a Bash file with the following script that checks whether the file path exists or not using the -f operator with the “test” command in the “if” condition.
#!/bin/bash
#Set the filename with the directory location
filename='temp/courses.txt'
#Check whether the file exists or not using the -f operator
if test -f "$filename"; then
echo "File exists."
else
echo "File does not exist."
fi
The following output appears after executing the script:
The methods of checking whether a regular file exists or not in the current location or the particular location are shown in this tutorial using multiple examples.
Original article source at: https://linuxhint.com/
1659574920
Simple.
Karate is the only open-source tool to combine API test-automation, mocks, performance-testing and even UI automation into a single, unified framework. The BDD syntax popularized by Cucumber is language-neutral, and easy for even non-programmers. Assertions and HTML reports are built-in, and you can run tests in parallel for speed.
There's also a cross-platform stand-alone executable for teams not comfortable with Java. You don't have to compile code. Just write tests in a simple, readable syntax - carefully designed for HTTP, JSON, GraphQL and XML. And you can mix API and UI test-automation within the same test script.
A Java API also exists for those who prefer to programmatically integrate Karate's rich automation and data-assertion capabilities.
If you are familiar with Cucumber / Gherkin, the big difference here is that you don't need to write extra "glue" code or Java "step definitions" !
It is worth pointing out that JSON is a 'first class citizen' of the syntax such that you can express payload and expected data without having to use double-quotes and without having to enclose JSON field names in quotes. There is no need to 'escape' characters like you would have had to in Java or other programming languages.
And you don't need to create additional Java classes for any of the payloads that you need to work with.
Index
Features
multipart/mixed
and multipart/related
A set of real-life examples can be found here: Karate Demos
For teams familiar with or currently using REST-assured, this detailed comparison of Karate vs REST-assured - can help you evaluate Karate. Do note that if you prefer a pure Java API - Karate has that covered, and with far more capabilities.
You can find a lot more references, tutorials and blog-posts in the wiki. Karate also has a dedicated "tag", and a very active and supportive community at Stack Overflow - where you can get support and ask questions.
Getting Started
If you are a Java developer - Karate requires at least Java 8 and then either Maven, Gradle, Eclipse or IntelliJ to be installed. Note that Karate works fine on OpenJDK.
If you are new to programming or test-automation, refer to the options for IDE support and the official IntelliJ plugin is recommended. Other options are the quickstart or the standalone executable.
If you don't want to use Java, you have the option of just downloading and extracting the ZIP release. Try this especially if you don't have much experience with programming or test-automation. We recommend that you use the Karate extension for Visual Studio Code - and with that, JavaScript, .NET and Python programmers will feel right at home.
Visual Studio Code can be used for Java (or Maven) projects as well. One reason to use it is the excellent debug support that we have for Karate.
All you need is available in the karate-core
artifact. You can run tests with this directly, but teams can choose the JUnit variant (shown below) that pulls in JUnit 5 and slightly improves the in-IDE experience.
<dependency>
<groupId>com.intuit.karate</groupId>
<artifactId>karate-junit5</artifactId>
<version>1.2.0</version>
<scope>test</scope>
</dependency>
If you want to use JUnit 4, use karate-junit4
instead of karate-junit5
.
Alternatively for Gradle:
testCompile 'com.intuit.karate:karate-junit5:1.2.0'
Also refer to the wiki for using Karate with Gradle.
It may be easier for you to use the Karate Maven archetype to create a skeleton project with one command. You can then skip the next few sections, as the pom.xml
, recommended directory structure, sample test and JUnit 5 runners - will be created for you.
If you are behind a corporate proxy, or especially if your local Maven installation has been configured to point to a repository within your local network, the command below may not work. One workaround is to temporarily disable or rename your Maven
settings.xml
file, and try again.
You can replace the values of com.mycompany
and myproject
as per your needs.
mvn archetype:generate \
-DarchetypeGroupId=com.intuit.karate \
-DarchetypeArtifactId=karate-archetype \
-DarchetypeVersion=1.2.0 \
-DgroupId=com.mycompany \
-DartifactId=myproject
This will create a folder called myproject
(or whatever you set the name to).
Refer to the wiki - IDE Support.
A Karate test script has the file extension .feature
which is the standard followed by Cucumber. You are free to organize your files using regular Java package conventions.
The Maven tradition is to have non-Java source files in a separate src/test/resources
folder structure - but we recommend that you keep them side-by-side with your *.java
files. When you have a large and complex project, you will end up with a few data files (e.g. *.js
, *.json
, *.txt
) as well and it is much more convenient to see the *.java
and *.feature
files and all related artifacts in the same place.
This can be easily achieved with the following tweak to your maven <build>
section.
<build>
<testResources>
<testResource>
<directory>src/test/java</directory>
<excludes>
<exclude>**/*.java</exclude>
</excludes>
</testResource>
</testResources>
<plugins>
...
</plugins>
</build>
This is very common in the world of Maven users and keep in mind that these are tests and not production code.
Alternatively, if using Gradle then add the following sourceSets
definition
sourceSets {
test {
resources {
srcDir file('src/test/java')
exclude '**/*.java'
}
}
}
With the above in place, you don't have to keep switching between your src/test/java
and src/test/resources
folders, you can have all your test-code and artifacts under src/test/java
and everything will work as expected.
Once you get used to this, you may even start wondering why projects need a src/test/resources
folder at all !
Soumendra Daas has created a nice example and guide that you can use as a reference here: hello-karate
. This demonstrates a Java Maven + JUnit 5 project set up to test a Spring Boot app.
Since these are tests and not production Java code, you don't need to be bound by the com.mycompany.foo.bar
convention and the un-necessary explosion of sub-folders that ensues. We suggest that you have a folder hierarchy only one or two levels deep - where the folder names clearly identify which 'resource', 'entity' or API is the web-service under test.
For example:
src/test/java
|
+-- karate-config.js
+-- logback-test.xml
+-- some-reusable.feature
+-- some-classpath-function.js
+-- some-classpath-payload.json
|
\-- animals
|
+-- AnimalsTest.java
|
+-- cats
| |
| +-- cats-post.feature
| +-- cats-get.feature
| +-- cat.json
| \-- CatsRunner.java
|
\-- dogs
|
+-- dog-crud.feature
+-- dog.json
+-- some-helper-function.js
\-- DogsRunner.java
Assuming you use JUnit, there are some good reasons for the recommended (best practice) naming convention and choice of file-placement shown above:
*Test.java
convention for the JUnit classes (e.g. CatsRunner.java
) in the cats
and dogs
folder ensures that these tests will not be picked up when invoking mvn test
(for the whole project) from the command line. But you can still invoke these tests from the IDE, which is convenient when in development mode.AnimalsTest.java
(the only file that follows the *Test.java
naming convention) acts as the 'test suite' for the entire project. By default, Karate will load all *.feature
files from sub-directories as well. But since some-reusable.feature
is above AnimalsTest.java
in the folder hierarchy, it will not be picked-up. Which is exactly what we want, because some-reusable.feature
is designed to be called only from one of the other test scripts (perhaps with some parameters being passed). You can also use tags to skip files.some-classpath-function.js
and some-classpath-payload.json
are in the 'root' of the Java 'classpath' which means they can be easily read (and re-used) from any test-script by using the classpath:
prefix, for e.g: read('classpath:some-classpath-function.js')
. Relative paths will also work.For details on what actually goes into a script or *.feature
file, refer to the syntax guide.
file.encoding
In some cases, for large payloads and especially when the default system encoding is not UTF-8
(Windows or non-US locales), you may run into issues where a java.io.ByteArrayInputStream
is encountered instead of a string. Other errors could be a java.net.URISyntaxException
and match
not working as expected because of special or foreign characters, e.g. German or ISO-8859-15
. Typical symptoms are your tests working fine via the IDE but not when running via Maven or Gradle. The solution is to ensure that when Karate tests run, the JVM file.encoding
is set to UTF-8
. This can be done via the maven-surefire-plugin
configuration. Add the plugin to the <build>/<plugins>
section of your pom.xml
if not already present:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.10</version>
<configuration>
<argLine>-Dfile.encoding=UTF-8</argLine>
</configuration>
</plugin>
If you want to use JUnit 4, use the
karate-junit4
Maven dependency instead ofkarate-junit5
.
To run a script *.feature
file from your Java IDE, you just need the following empty test-class in the same package. The name of the class doesn't matter, and it will automatically run any *.feature
file in the same package. This comes in useful because depending on how you organize your files and folders - you can have multiple feature files executed by a single JUnit test-class.
package animals.cats;
import com.intuit.karate.junit4.Karate;
import org.junit.runner.RunWith;
@RunWith(Karate.class)
public class CatsRunner {
}
Refer to your IDE documentation for how to run a JUnit class. Typically right-clicking on the file in the project browser or even within the editor view would bring up the "Run as JUnit Test" menu option.
Karate will traverse sub-directories and look for
*.feature
files. For example if you have the JUnit class in thecom.mycompany
package,*.feature
files incom.mycompany.foo
andcom.mycompany.bar
will also be run. This is one reason why you may want to prefer a 'flat' directory structure as explained above.
Karate supports JUnit 5 and the advantage is that you can have multiple methods in a test-class. Only 1 import
is needed, and instead of a class-level annotation, you use a nice DRY and fluent-api to express which tests and tags you want to use.
Note that the Java class does not need to be public
and even the test methods do not need to be public
- so tests end up being very concise.
Here is an example:
package karate;
import com.intuit.karate.junit5.Karate;
class SampleTest {
@Karate.Test
Karate testSample() {
return Karate.run("sample").relativeTo(getClass());
}
@Karate.Test
Karate testTags() {
return Karate.run("tags").tags("@second").relativeTo(getClass());
}
@Karate.Test
Karate testSystemProperty() {
return Karate.run("classpath:karate/tags.feature")
.tags("@second")
.karateEnv("e2e")
.systemProperty("foo", "bar");
}
}
Note that more "builder" methods are available from the Runner.Builder
class such as reportDir()
etc.
You should be able to right-click and run a single method using your IDE - which should be sufficient when you are in development mode. But to be able to run JUnit 5 tests from the command-line, you need to ensure that the latest version of the maven-surefire-plugin is present in your project pom.xml
(within the <build>/<plugins>
section):
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.2</version>
</plugin>
To run a single test method, for example the testTags()
in the example above, you can do this:
mvn test -Dtest=SampleTest#testTags
Also look at how to run tests via the command-line and the parallel runner.
When you use a JUnit runner - after the execution of each feature, an HTML report is output to the target/karate-reports
folder and the full path will be printed to the console (see video).
html report: (paste into browser to view)
-----------------------------------------
file:///projects/myproject/target/karate-reports/mypackage.myfeature.html
You can easily select (double-click), copy and paste this file:
URL into your browser address bar. This report is useful for troubleshooting and debugging a test because all requests and responses are shown in-line with the steps, along with error messages and the output of print
statements. Just re-fresh your browser window if you re-run the test.
This will give you the usual HTML report showing what features will be run, including all steps shown (including comments) so that it can be reviewed. Of course the actual time-durations, and logs will be missing, and everything will pass.
The “dry run” report is useful to review the tag "coverage" of what will be run. For example you can get a nice feature “coverage” report, provided you have a rich set of tags. e.g. @smoke @module=one @module=two
etc.
The Runner.Builder
API has a dryRun()
method to switch this on. Note that this mode can be also triggered via the command-line by adding -D
or --dryrun
to the karate.options
.
Normally in dev mode, you will use your IDE to run a *.feature
file directly or via the companion 'runner' JUnit Java class. When you have a 'runner' class in place, it would be possible to run it from the command-line as well.
Note that the mvn test
command only runs test classes that follow the *Test.java
naming convention by default. But you can choose a single test to run like this:
mvn test -Dtest=CatsRunner
karate.options
When your Java test "runner" is linked to multiple feature files, which will be the case when you use the recommended parallel runner, you can narrow down your scope to a single feature, scenario or directory via the command-line, useful in dev-mode. Note how even tags to exclude (or include) can be specified:
Note that any
Feature
orScenario
with the special@ignore
tag will be skipped by default.
mvn test "-Dkarate.options=--tags ~@skipme classpath:demo/cats/cats.feature" -Dtest=DemoTestParallel
Multiple feature files (or paths) can be specified, de-limited by the space character. They should be at the end of the karate.options
. To run only a single scenario, append the line number on which the scenario is defined, de-limited by :
.
mvn test "-Dkarate.options=PathToFeatureFiles/order.feature:12" -Dtest=DemoTestParallel
For Gradle, you must extend the test task to allow the karate.options
to be passed to the runtime (otherwise they get consumed by Gradle itself). To do that, add the following:
test {
// pull karate options into the runtime
systemProperty "karate.options", System.properties.getProperty("karate.options")
// pull karate env into the runtime
systemProperty "karate.env", System.properties.getProperty("karate.env")
// ensure tests are always run
outputs.upToDateWhen { false }
}
And then the above command in Gradle would look like:
./gradlew test --tests *CatsRunner
or
./gradlew test -Dtest.single=CatsRunner
The recommended way to define and run test-suites and reporting in Karate is to use the parallel runner, described in the next section. The approach in this section is more suited for troubleshooting in dev-mode, using your IDE.
One way to define 'test-suites' in Karate is to have a JUnit class at a level 'above' (in terms of folder hierarchy) all the *.feature
files in your project. So if you take the previous folder structure example, you can do this on the command-line:
mvn test "-Dkarate.options=--tags ~@skipme" -Dtest=AnimalsTest
Here, AnimalsTest
is the name of the Java class we designated to run the multiple *.feature
files that make up your test-suite. There is a neat way to tag your tests and the above example demonstrates how to run all tests except the ones tagged @skipme
.
Note that the special, built-in tag @ignore
will always be skipped by default, and you don't need to specify ~@ignore
anywhere.
You can 'lock down' the fact that you only want to execute the single JUnit class that functions as a test-suite - by using the following maven-surefire-plugin configuration:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${maven.surefire.version}</version>
<configuration>
<includes>
<include>animals/AnimalsTest.java</include>
</includes>
<systemProperties>
<karate.options>--tags @smoke</karate.options>
</systemProperties>
</configuration>
</plugin>
Note how the karate.options
can be specified using the <systemProperties>
configuration.
For Gradle, you simply specify the test which is to be include
-d:
test {
include 'animals/AnimalsTest.java'
// pull karate options into the runtime
systemProperty "karate.options", System.properties.getProperty("karate.options")
// pull karate env into the runtime
systemProperty "karate.env", System.properties.getProperty("karate.env")
// ensure tests are always run
outputs.upToDateWhen { false }
}
The big drawback of the approach above is that you cannot run tests in parallel. The recommended approach for Karate reporting in a Continuous Integration set-up is described in the next section which can generate the JUnit XML format that most CI tools can consume. The Cucumber JSON format can be also emitted, which gives you plenty of options for generating pretty reports using third-party maven plugins.
And most importantly - you can run tests in parallel without having to depend on third-party hacks that introduce code-generation and config 'bloat' into your pom.xml
or build.gradle
.
Karate can run tests in parallel, and dramatically cut down execution time. This is a 'core' feature and does not depend on JUnit, Maven or Gradle.
Results
object to check if any scenarios failed, and to even summarize the errorsreportDir
" path you specify, and you can easily configure your CI to look for these files after a build (for e.g. in **/*.xml
or **/karate-reports/*.xml
). Note that you have to call the outputJunitXml(true)
method on the Runner
"builder"..json
instead of .xml
. Note that you have to call the outputCucumberJson(true)
method on the Runner
"builder".Important: do not use the
@RunWith(Karate.class)
annotation. This is a normal JUnit 4 test class ! If you want to use JUnit 4, use thekarate-junit4
Maven dependency instead ofkarate-junit5
.
import com.intuit.karate.Results;
import com.intuit.karate.Runner;
import static org.junit.Assert.*;
import org.junit.Test;
public class TestParallel {
@Test
public void testParallel() {
Results results = Runner.path("classpath:some/package").tags("@smoke").parallel(5);
assertTrue(results.getErrorMessages(), results.getFailCount() == 0);
}
}
@RunWith
annotation), and you write a plain vanilla JUnit test (it could even be a normal Java class with a main
method)Runner.path()
"builder" method in karate-core
is how you refer to the package you want to execute, and all feature files within sub-directories will be picked upRunner.path()
takes multiple string parameters, so you can refer to multiple packages or even individual *.feature
files and easily "compose" a test-suiteRunner.path("classpath:animals", "classpath:some/other/package.feature")
tags()
API, note that by default, any *.feature
file tagged with the special (built-in) tag: @ignore
will be skipped. You can also specify tags on the command-line. The tags()
method also takes multiple arguments, for e.g.tags("@customer", "@smoke")
tags("@customer,@smoke")
reportDir()
method if you want to customize the directory to which the HTML, XML and JSON files will be output, it defaults to target/karate-reports
List<String>
as the path()
and tags()
methods argumentsparallel()
has to be the last method called, and you pass the number of parallel threads needed. It returns a Results
object that has all the information you need - such as the number of passed or failed tests.For JUnit 5 you can omit the public
modifier for the class and method, and there are some changes to import
package names. The method signature of the assertTrue
has flipped around a bit. Also note that you don't use @Karate.Test
for the method, and you just use the normal JUnit 5 @Test
annotation.
Else the Runner.path()
"builder" API is the same, refer the description above for JUnit 4.
import com.intuit.karate.Results;
import com.intuit.karate.Runner;
import static org.junit.jupiter.api.Assertions.*;
import org.junit.jupiter.api.Test;
class TestParallel {
@Test
void testParallel() {
Results results = Runner.path("classpath:animals").tags("~@skipme").parallel(5);
assertEquals(0, results.getFailCount(), results.getErrorMessages());
}
}
For convenience, some stats are logged to the console when execution completes, which should look something like this:
======================================================
elapsed: 2.35 | threads: 5 | thread time: 4.98
features: 54 | ignored: 25 | efficiency: 0.42
scenarios: 145 | passed: 145 | failed: 0
======================================================
The parallel runner will always run Feature
-s in parallel. Karate will also run Scenario
-s in parallel by default. So if you have a Feature
with multiple Scenario
-s in it - they will execute in parallel, and even each Examples
row in a Scenario Outline
will do so !
A karate-timeline.html
file will also be saved to the report output directory mentioned above (target/karate-reports
by default) - which is useful for visually verifying or troubleshooting the effectiveness of the test-run (see video).
@parallel=false
In rare cases you may want to suppress the default of Scenario
-s executing in parallel and the special tag
@parallel=false
can be used. If you place it above the Feature
keyword, it will apply to all Scenario
-s. And if you just want one or two Scenario
-s to NOT run in parallel, you can place this tag above only those Scenario
-s. See example.
Note that forcing Scenario
-s to run in a particular sequence is an anti-pattern, and should be avoided as far as possible.
As mentioned above, most CI tools would be able to process the JUnit XML output of the parallel runner and determine the status of the build as well as generate reports.
The Karate Demo has a working example of the recommended parallel-runner set up. It also details how a third-party library can be easily used to generate some very nice-looking reports, from the JSON output of the parallel runner.
For example, here below is an actual report generated by the cucumber-reporting open-source library.
Another example for a popular Maven reporting plugin that is compatible with Karate JSON is Cluecumber.
The demo also features code-coverage using Jacoco, and some tips for even non-Java back-ends. Some third-party report-server solutions integrate with Karate such as ReportPortal.io.
This is optional, and Karate will work without the logging config in place, but the default console logging may be too verbose for your needs.
Karate uses LOGBack which looks for a file called logback-test.xml
on the 'classpath'.
In rare cases, e.g. if you are using Karate to create a Java application, LOGBack will look for
logback.xml
Here is a sample logback-test.xml
for you to get started.
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.FileAppender">
<file>target/karate.log</file>
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<logger name="com.intuit.karate" level="DEBUG"/>
<root level="info">
<appender-ref ref="STDOUT" />
<appender-ref ref="FILE" />
</root>
</configuration>
You can change the com.intuit.karate
logger level to INFO
to reduce the amount of logging. When the level is DEBUG
the entire request and response payloads are logged. If you use the above config, logs will be captured in target/karate.log
.
If you want to keep the level as DEBUG
(for HTML reports) but suppress logging to the console, you can comment out the STDOUT
"root" appender-ref
:
<root level="warn">
<!-- <appender-ref ref="STDOUT" /> -->
<appender-ref ref="FILE" />
</root>
Or another option is to use a ThresholdFilter
, so you still see critical logs on the console:
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>WARN</level>
</filter>
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
If you want to exclude the logs from your CI/CD pipeline but keep them in the execution of your users in their locals you can configure your logback using Janino. In such cases it might be desirable to have your tests using karate.logger.debug('your additional info')
instead of the print
keyword so you can keep logs in your pipeline in INFO.
For suppressing sensitive information such as secrets and passwords from the log and reports, see Log Masking and Report Verbosity.
Configuration
You can skip this section and jump straight to the Syntax Guide if you are in a hurry to get started with Karate. Things will work even if the
karate-config.js
file is not present.
The 'classpath' is a Java concept and is where some configuration files such as the one for logging are expected to be by default. If you use the Maven <test-resources>
tweak described earlier (recommended), the 'root' of the classpath will be in the src/test/java
folder, or else would be src/test/resources
.
karate-config.js
The only 'rule' is that on start-up Karate expects a file called karate-config.js
to exist on the 'classpath' and contain a JavaScript function. The function is expected to return a JSON object and all keys and values in that JSON object will be made available as script variables.
And that's all there is to Karate configuration ! You can easily get the value of the current 'environment' or 'profile', and then set up 'global' variables using some simple JavaScript. Here is an example:
function fn() {
var env = karate.env; // get java system property 'karate.env'
karate.log('karate.env system property was:', env);
if (!env) {
env = 'dev'; // a custom 'intelligent' default
}
var config = { // base config JSON
appId: 'my.app.id',
appSecret: 'my.secret',
someUrlBase: 'https://some-host.com/v1/auth/',
anotherUrlBase: 'https://another-host.com/v1/'
};
if (env == 'stage') {
// over-ride only those that need to be
config.someUrlBase = 'https://stage-host/v1/auth';
} else if (env == 'e2e') {
config.someUrlBase = 'https://e2e-host/v1/auth';
}
// don't waste time waiting for a connection or if servers don't respond within 5 seconds
karate.configure('connectTimeout', 5000);
karate.configure('readTimeout', 5000);
return config;
}
Here above, you see the
karate.log()
,karate.env
andkarate.configure()
"helpers" being used. Note that thekarate-config.js
is re-processed for everyScenario
and in rare cases, you may want to initialize (e.g. auth tokens) only once for all of your tests. This can be achieved usingkarate.callSingle()
.
A common requirement is to pass dynamic parameter values via the command line, and you can use the karate.properties['some.name']
syntax for getting a system property passed via JVM options in the form -Dsome.name=foo
. Refer to the section on dynamic port numbers for an example.
You can even retrieve operating-system environment variables via Java interop as follows:
var systemPath = java.lang.System.getenv('PATH');
This decision to use JavaScript for config is influenced by years of experience with the set-up of complicated test-suites and fighting with Maven profiles, Maven resource-filtering and the XML-soup that somehow gets summoned by the Maven AntRun plugin.
Karate's approach frees you from Maven, is far more expressive, allows you to eyeball all environments in one place, and is still a plain-text file. If you want, you could even create nested chunks of JSON that 'name-space' your config variables.
One way to appreciate Karate's approach is to think over what it takes to add a new environment-dependent variable (e.g. a password) into a test. In typical frameworks it could mean changing multiple properties files, maven profiles and placeholders, and maybe even threading the value via a dependency-injection framework - before you can even access the value within your test.
This approach is indeed slightly more complicated than traditional *.properties
files - but you need this complexity. Keep in mind that these are tests (not production code) and this config is going to be maintained more by the dev or QE team instead of the 'ops' or operations team.
And there is no more worrying about Maven profiles and whether the 'right' *.properties
file has been copied to the proper place.
There is only one thing you need to do to switch the environment - which is to set a Java system property.
By default, the value of
karate.env
when you access it withinkarate-config.js
- would benull
.
The recipe for doing this when running Maven from the command line is:
mvn test -DargLine="-Dkarate.env=e2e"
Or in Gradle:
./gradlew test -Dkarate.env=e2e
You can refer to the documentation of the Maven Surefire Plugin for alternate ways of achieving this, but the argLine
approach is the simplest and should be more than sufficient for your Continuous Integration or test-automation needs.
Here's a reminder that running any single JUnit test via Maven can be done by:
mvn test -Dtest=CatsRunner
Where CatsRunner
is the JUnit class name (in any package) you wish to run.
Karate is flexible, you can easily over-write config variables within each individual test-script - which is very convenient when in dev-mode or rapid-prototyping.
System.setProperty("karate.env", "pre-prod");
For advanced users, note that tags and the karate.env
environment-switch can be "linked" using the special environment tags.
When your project gets complex, you can have separate karate-config-<env>.js
files that will be processed for that specific value of karate.env
. This is especially useful when you want to maintain passwords, secrets or even URL-s specific for your local dev environment.
Make sure you configure your source code management system (e.g. Git) to ignore
karate-config-*.js
if needed.
There should always be
karate-config.js
in the "root" folder, even if you don't have any "common" config. In such cases, the function can do nothing or return an empty JSON. Learn more.
Here are the rules Karate uses on bootstrap (before every Scenario
or Examples
row in a Scenario Outline
):
karate.config.dir
was set, Karate will look in this folder for karate-config.js
- and if found, will process itkarate-config.js
was not found in the above location (or karate.config.dir
was not set), classpath:karate-config.js
would be processed (this is the default / common case)karate.env
system property was setkarate.config.dir
was set, Karate will also look for file:<karate.config.dir>/karate-config-<env>.js
karate.config.dir
was not set), Karate will look for classpath:karate-config-<env>.js
karate-config-<env>.js
exists, it will be processed, and the configuration (JSON entries) returned by this function will over-ride any set by karate-config.js
Refer to the karate demo for an example.
karate-base.js
Advanced users who build frameworks on top of Karate have the option to supply a karate-base.js
file that Karate will look for on the classpath:
. This is useful when you ship a JAR file containing re-usable features and JavaScript / Java code and want to 'default' a few variables that teams can 'inherit' from. So an additional rule in the above flow of 'rules' (before the first step) is as follows:
classpath:karate-base.js
exists - Karate will process this as a configuration source before anything elseSyntax Guide
Karate scripts are technically in 'Gherkin' format - but all you need to grok as someone who needs to test web-services are the three sections: Feature
, Background
and Scenario
. There can be multiple Scenario-s in a *.feature
file, and at least one should be present. The Background
is optional.
Variables set using
def
in theBackground
will be re-set before everyScenario
. If you are looking for a way to do something only once perFeature
, take a look atcallonce
. On the other hand, if you are expecting a variable in theBackground
to be modified by oneScenario
so that later ones can see the updated value - that is not how you should think of them, and you should combine your 'flow' into one scenario. Keep in mind that you should be able to comment-out aScenario
or skip some viatags
without impacting any others. Note that the parallel runner will runScenario
-s in parallel, which means they can run in any order. If you are looking for ways to do something only once per feature or across all your tests, see Hooks.
Lines that start with a #
are comments.
Feature: brief description of what is being tested
more lines of description if needed.
Background:
# this section is optional !
# steps here are executed before each Scenario in this file
# variables defined here will be 'global' to all scenarios
# and will be re-initialized before every scenario
Scenario: brief description of this scenario
# steps for this scenario
Scenario: a different scenario
# steps for this other scenario
There is also a variant of
Scenario
calledScenario Outline
along withExamples
, useful for data-driven tests.
The business of web-services testing requires access to low-level aspects such as HTTP headers, URL-paths, query-parameters, complex JSON or XML payloads and response-codes. And Karate gives you control over these aspects with the small set of keywords focused on HTTP such as url
, path
, param
, etc.
Karate does not attempt to have tests be in "natural language" like how Cucumber tests are traditionally expected to be. That said, the syntax is very concise, and the convention of every step having to start with either Given
, And
, When
or Then
, makes things very readable. You end up with a decent approximation of BDD even though web-services by nature are "headless", without a UI, and not really human-friendly.
Karate was based on Cucumber-JVM until version 0.8.0 but the parser and engine were re-written from scratch in 0.9.0 onwards. So we use the same Gherkin syntax - but the similarity ends there.
If you are familiar with Cucumber (JVM), you may be wondering if you need to write step-definitions. The answer is no.
Karate's approach is that all the step-definitions you need in order to work with HTTP, JSON and XML have been already implemented. And since you can easily extend Karate using JavaScript, there is no need to compile Java code any more.
The following table summarizes some key differences between Cucumber and Karate.
:white_small_square: | Cucumber | Karate |
---|---|---|
Step Definitions Built-In | No. You need to keep implementing them as your functionality grows. This can get very tedious, especially since for dependency-injection, you are on your own. | :white_check_mark: Yes. No extra Java code needed. |
Single Layer of Code To Maintain | No. There are 2 Layers. The Gherkin spec or *.feature files make up one layer, and you will also have the corresponding Java step-definitions. | :white_check_mark: Yes. Only 1 layer of Karate-script (based on Gherkin). |
Readable Specification | Yes. Cucumber will read like natural language if you implement the step-definitions right. | :x: No. Although Karate is simple, and a true DSL, it is ultimately a mini-programming language. But it is perfect for testing web-services at the level of HTTP requests and responses. |
Re-Use Feature Files | No. Cucumber does not support being able to call (and thus re-use) other *.feature files from a test-script. | :white_check_mark: Yes. |
Dynamic Data-Driven Testing | No. Cucumber's Scenario Outline expects the Examples to contain a fixed set of rows. | :white_check_mark: Yes. Karate's support for calling other *.feature files allows you to use a JSON array as the data-source and you can use JSON or even CSV directly in a data-driven Scenario Outline . |
Parallel Execution | No. There are some challenges (especially with reporting) and you can find various discussions and third-party projects on the web that attempt to close this gap | :white_check_mark: Yes. Karate runs even Scenario -s in parallel, not just Feature -s. |
Run 'Set-Up' Routines Only Once | No. Cucumber has a limitation where Background steps are re-run for every Scenario and worse - even for every Examples row within a Scenario Outline . This has been a highly-requested open issue for a long time. | :white_check_mark: Yes. |
Embedded JavaScript Engine | No. And you have to roll your own approach to environment-specific configuration and worry about dependency-injection. | :white_check_mark: Yes. Easily define all environments in a single file and share variables across all scenarios. Full script-ability via JS or Java interop. |
One nice thing about the design of the Gherkin syntax is that script-steps are treated the same no matter whether they start with the keyword Given
, And
, When
or Then
. What this means is that you are free to use whatever makes sense for you. You could even have all the steps start with When
and Karate won't care.
In fact Gherkin supports the catch-all symbol '*
' - instead of forcing you to use Given
, When
or Then
. This is perfect for those cases where it really doesn't make sense - for example the Background
section or when you use the def
or set
syntax. When eyeballing a test-script, think of the *
as a 'bullet-point'.
You can read more about the Given-When-Then convention at the Cucumber reference documentation. Since Karate uses Gherkin, you can also employ data-driven techniques such as expressing data-tables in test scripts. Another good thing that Karate inherits is the nice IDE support for Cucumber that IntelliJ and Eclipse have. So you can do things like right-click and run a *.feature
file (or scenario) without needing to use a JUnit runner.
For a detailed discussion on BDD and how Karate relates to Cucumber, please refer to this blog-post: Yes, Karate is not true BDD. It is the opinion of the author of Karate that true BDD is un-necessary over-kill for API testing, and this is explained more in this answer on Stack Overflow.
With the formalities out of the way, let's dive straight into the syntax.
Setting and Using Variables
def
# assigning a string value:
Given def myVar = 'world'
# using a variable
Then print myVar
# assigning a number (you can use '*' instead of Given / When / Then)
* def myNum = 5
Note that def
will over-write any variable that was using the same name earlier. Keep in mind that the start-up configuration routine could have already initialized some variables before the script even started. For details of scope and visibility of variables, see Script Structure.
Note that
url
andrequest
are not allowed as variable names. This is just to reduce confusion for users new to Karate who tend to do* def request = {}
and expect therequest
body or similarly, theurl
to be set.
The examples above are simple, but a variety of expression 'shapes' are supported on the right hand side of the =
symbol. The section on Karate Expressions goes into the details.
assert
true
Once defined, you can refer to a variable by name. Expressions are evaluated using the embedded JavaScript engine. The assert keyword can be used to assert that an expression returns a boolean value.
Given def color = 'red '
And def num = 5
Then assert color + num == 'red 5'
Everything to the right of the assert
keyword will be evaluated as a single expression.
Something worth mentioning here is that you would hardly need to use assert
in your test scripts. Instead you would typically use the match
keyword, that is designed for performing powerful assertions against JSON and XML response payloads.
print
You can use print
to log variables to the console in the middle of a script. For convenience, you can have multiple expressions separated by commas, so this is the recommended pattern:
* print 'the value of a is:', a
Similar to assert
, the expressions on the right-hand-side of a print
have to be valid JavaScript. JsonPath and Karate expressions are not supported.
If you use commas (instead of concatenating strings using +
), Karate will 'pretty-print' variables, which is what you typically want when dealing with JSON or XML.
* def myJson = { foo: 'bar', baz: [1, 2, 3] }
* print 'the value of myJson is:', myJson
Which results in the following output:
20:29:11.290 [main] INFO com.intuit.karate - [print] the value of myJson is: {
"foo": "bar",
"baz": [
1,
2,
3
]
}
Since XML is represented internally as a JSON-like or map-like object, if you perform string concatenation when printing, you will not see XML - which can be confusing at first. Use the comma-delimited form (see above) or the JS helper (see below).
The built-in karate
object is explained in detail later, but for now, note that this is also injected into print
(and even assert
) statements, and it has a helpful pretty
method, that takes a JSON argument and a prettyXml
method that deals with XML. So you could have also done something like:
* print 'the value of myJson is:\n' + karate.pretty(myJson)
Also refer to the configure
keyword on how to switch on pretty-printing of all HTTP requests and responses.
'Native' data types
Native data types mean that you can insert them into a script without having to worry about enclosing them in strings and then having to 'escape' double-quotes all over the place. They seamlessly fit 'in-line' within your test script.
Note that the parser is 'lenient' so that you don't have to enclose all keys in double-quotes.
* def cat = { name: 'Billie', scores: [2, 5] }
* assert cat.scores[1] == 5
Some characters such as the hyphen
-
are not permitted in 'lenient' JSON keys (because they are interpreted by the JS engine as a 'minus sign'). In such cases, you have to use string quotes:{ 'Content-Type': 'application/json' }
When asserting for expected values in JSON or XML, always prefer using match
instead of assert
. Match failure messages are much more descriptive and useful, and you get the power of embedded expressions and fuzzy matching.
* def cats = [{ name: 'Billie' }, { name: 'Bob' }]
* match cats[1] == { name: 'Bob' }
Karate's native support for JSON means that you can assign parts of a JSON instance into another variable, which is useful when dealing with complex response
payloads.
* def first = cats[0]
* match first == { name: 'Billie' }
For manipulating or updating JSON (or XML) using path expressions, refer to the set
keyword.
Given def cat = <cat><name>Billie</name><scores><score>2</score><score>5</score></scores></cat>
# sadly, xpath list indexes start from 1
Then match cat/cat/scores/score[2] == '5'
# but karate allows you to traverse xml like json !!
Then match cat.cat.scores.score[1] == 5
Karate has a very useful payload 'templating' approach. Variables can be referred to within JSON, for example:
Given def user = { name: 'john', age: 21 }
And def lang = 'en'
When def session = { name: '#(user.name)', locale: '#(lang)', sessionUser: '#(user)' }
So the rule is - if a string value within a JSON (or XML) object declaration is enclosed between #(
and )
- it will be evaluated as a JavaScript expression. And any variables which are alive in the context can be used in this expression. Here's how it works for XML:
Given def user = <user><name>john</name></user>
And def lang = 'en'
When def session = <session><locale>#(lang)</locale><sessionUser>#(user)</sessionUser></session>
This comes in useful in some cases - and avoids needing to use the set
keyword or JavaScript functions to manipulate JSON. So you get the best of both worlds: the elegance of JSON to express complex nested data - while at the same time being able to dynamically plug values (that could even be other JSON or XML 'trees') into a 'template'.
Note that embedded expressions will be evaluated even when you read()
from a JSON or XML file. This is super-useful for re-use and data-driven tests.
A few special built-in variables such as $
(which is a reference to the JSON root) - can be mixed into JSON embedded expressions.
A special case of embedded expressions can remove a JSON key (or XML element / attribute) if the expression evaluates to null
.
read()
a JSON or XML file#(
and end with )
Because of the last rule above, note that string-concatenation may not work quite the way you expect:
# wrong !
* def foo = { bar: 'hello #(name)' }
# right !
* def foo = { bar: '#("hello " + name)' }
Observe how you can achieve string concatenation if you really want, because any valid JavaScript expression can be stuffed within an embedded expression. You could always do this in two steps:
* def temp = 'hello ' + name
* def foo = { bar: '#(temp)' }
As a convenience, embedded expressions are supported on the Right Hand Side of a match
statement even for "quoted string" literals:
* def foo = 'a1'
* match foo == '#("a" + 1)'
And do note that in Karate 1.0 onwards, ES6 string-interpolation within "backticks" is supported:
* param filter = `ORDER_DATE:"${todaysDate}"`
An alternative to embedded expressions (for JSON only) is to enclose the entire payload within parentheses - which tells Karate to evaluate it as pure JavaScript. This can be a lot simpler than embedded expressions in many cases, and JavaScript programmers will feel right at home.
The example below shows the difference between embedded expressions and enclosed JavaScript:
When def user = { name: 'john', age: 21 }
And def lang = 'en'
* def embedded = { name: '#(user.name)', locale: '#(lang)', sessionUser: '#(user)' }
* def enclosed = ({ name: user.name, locale: lang, sessionUser: user })
* match embedded == enclosed
So how would you choose between the two approaches to create JSON ? Embedded expressions are useful when you have complex JSON
read
from files, because you can auto-replace (or even remove) data-elements with values dynamically evaluated from variables. And the JSON will still be 'well-formed', and editable in your IDE or text-editor. Embedded expressions also make more sense in validation and schema-like short-cut situations. It can also be argued that the#
symbol is easy to spot when eyeballing your test scripts - which makes things more readable and clear.
The keywords def
, set
, match
, request
and eval
take multi-line input as the last argument. This is useful when you want to express a one-off lengthy snippet of text in-line, without having to split it out into a separate file. Note how triple-quotes ("""
) are used to enclose content. Here are some examples:
# instead of:
* def cat = <cat><name>Billie</name><scores><score>2</score><score>5</score></scores></cat>
# this is more readable:
* def cat =
"""
<cat>
<name>Billie</name>
<scores>
<score>2</score>
<score>5</score>
</scores>
</cat>
"""
# example of a request payload in-line
Given request
"""
<?xml version='1.0' encoding='UTF-8'?>
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<ns2:QueryUsageBalance xmlns:ns2="http://www.mycompany.com/usage/V1">
<ns2:UsageBalance>
<ns2:LicenseId>12341234</ns2:LicenseId>
</ns2:UsageBalance>
</ns2:QueryUsageBalance>
</S:Body>
</S:Envelope>
"""
# example of a payload assertion in-line
Then match response ==
"""
{ id: { domain: "DOM", type: "entityId", value: "#ignore" },
created: { on: "#ignore" },
lastUpdated: { on: "#ignore" },
entityState: "ACTIVE"
}
"""
table
Now that we have seen how JSON is a 'native' data type that Karate understands, there is a very nice way to create JSON using Cucumber's support for expressing data-tables.
* table cats
| name | age |
| 'Bob' | 2 |
| 'Wild' | 4 |
| 'Nyan' | 3 |
* match cats == [{name: 'Bob', age: 2}, {name: 'Wild', age: 4}, {name: 'Nyan', age: 3}]
The match
keyword is explained later, but it should be clear right away how convenient the table
keyword is. JSON can be combined with the ability to call other *.feature
files to achieve dynamic data-driven testing in Karate.
Notice that in the above example, string values within the table need to be enclosed in quotes. Otherwise they would be evaluated as expressions - which does come in useful for some dynamic data-driven situations:
* def one = 'hello'
* def two = { baz: 'world' }
* table json
| foo | bar |
| one | { baz: 1 } |
| two.baz | ['baz', 'ban'] |
* match json == [{ foo: 'hello', bar: { baz: 1 } }, { foo: 'world', bar: ['baz', 'ban'] }]
Yes, you can even nest chunks of JSON in tables, and things work as you would expect.
Empty cells or expressions that evaluate to null
will result in the key being omitted from the JSON. To force a null
value, wrap it in parentheses:
* def one = { baz: null }
* table json
| foo | bar |
| 'hello' | |
| one.baz | (null) |
| 'world' | null |
* match json == [{ foo: 'hello' }, { bar: null }, { foo: 'world' }]
An alternate way to create data is using the set
multiple syntax. It is actually a 'transpose' of the table
approach, and can be very convenient when there are a large number of keys per row or if the nesting is complex. Here is an example of what is possible:
* set search
| path | 0 | 1 | 2 |
| name.first | 'John' | 'Jane' | |
| name.last | 'Smith' | 'Doe' | 'Waldo' |
| age | 20 | | |
* match search[0] == { name: { first: 'John', last: 'Smith' }, age: 20 }
* match search[1] == { name: { first: 'Jane', last: 'Doe' } }
* match search[2] == { name: { last: 'Waldo' } }
text
Not something you would commonly use, but in some cases you need to disable Karate's default behavior of attempting to parse anything that looks like JSON (or XML) when using multi-line / string expressions. This is especially relevant when manipulating GraphQL queries - because although they look suspiciously like JSON, they are not, and tend to confuse Karate's internals. And as shown in the example below, having text 'in-line' is useful especially when you use the Scenario Outline:
and Examples:
for data-driven tests involving Cucumber-style place-holder substitutions in strings.
Scenario Outline:
# note the 'text' keyword instead of 'def'
* text query =
"""
{
hero(name: "<name>") {
height
mass
}
}
"""
Given path 'graphql'
And request { query: '#(query)' }
And header Accept = 'application/json'
When method post
Then status 200
Examples:
| name |
| John |
| Smith |
Note that if you did not need to inject Examples:
into 'placeholders' enclosed within <
and >
, reading from a file with the extension *.txt
may have been sufficient.
For placeholder-substitution, the replace
keyword can be used instead, but with the advantage that the text can be read from a file or dynamically created.
Karate is a great fit for testing GraphQL because of how easy it is to deal with dynamic and deeply nested JSON responses. Refer to this example for more details: graphql.feature
.
replace
Modifying existing JSON and XML is natively supported by Karate via the
set
keyword, andreplace
is primarily intended for dealing with raw strings. But when you deal with complex, nested JSON (or XML) - it may be easier in some cases to usereplace
, especially when you want to substitute multiple placeholders with one value, and when you don't need array manipulation. Sincereplace
auto-converts the result to a string, make sure you perform type conversion back to JSON (or XML) if applicable.
Karate provides an elegant 'native-like' experience for placeholder substitution within strings or text content. This is useful in any situation where you need to concatenate dynamic string fragments to form content such as GraphQL or SQL.
The placeholder format defaults to angle-brackets, for example: <replaceMe>
. Here is how to replace one placeholder at a time:
* def text = 'hello <foo> world'
* replace text.foo = 'bar'
* match text == 'hello bar world'
Karate makes it really easy to substitute multiple placeholders in a single, readable step as follows:
* def text = 'hello <one> world <two> bye'
* replace text
| token | value |
| one | 'cruel' |
| two | 'good' |
* match text == 'hello cruel world good bye'
Note how strings have to be enclosed in quotes. This is so that you can mix expressions into text replacements as shown below. This example also shows how you can use a custom placeholder format instead of the default:
* def text = 'hello <one> world ${two} bye'
* def first = 'cruel'
* def json = { second: 'good' }
* replace text
| token | value |
| one | first |
| ${two} | json.second |
* match text == 'hello cruel world good bye'
Refer to this file for a detailed example: replace.feature
For those who may prefer YAML as a simpler way to represent data, Karate allows you to read YAML content from a file - and it will be auto-converted into JSON.
# yaml from a file (the extension matters), and the data-type of 'bar' would be JSON
* def bar = read('data.yaml')
yaml
A very rare need is to be able to convert a string which happens to be in YAML form into JSON, and this can be done via the yaml
type cast keyword. For example - if a response data element or downloaded file is YAML and you need to use the data in subsequent steps. Also see type conversion.
* text foo =
"""
name: John
input:
id: 1
subType:
name: Smith
deleted: false
"""
# yaml to json type conversion
* yaml foo = foo
* match foo ==
"""
{
name: 'John',
input: {
id: 1,
subType: { name: 'Smith', deleted: false }
}
}
"""
Karate can read *.csv
files and will auto-convert them to JSON. A header row is always expected. See the section on reading files - and also this example dynamic-csv.feature
, which shows off the convenience of dynamic Scenario Outline
-s.
In rare cases you may want to use a csv-file as-is and not auto-convert it to JSON. A good example is when you want to use a CSV file as the request-body for a file-upload. You could get by by renaming the file-extension to say *.txt
but an alternative is to use the karate.readAsString()
API.
csv
Just like yaml
, you may occasionally need to convert a string which happens to be in CSV form into JSON, and this can be done via the csv
keyword.
* text foo =
"""
name,type
Billie,LOL
Bob,Wild
"""
* csv bar = foo
* match bar == [{ name: 'Billie', type: 'LOL' }, { name: 'Bob', type: 'Wild' }]
JavaScript Functions are also 'native'. And yes, functions can take arguments.
Standard JavaScript syntax rules apply, but the right-hand-side should begin with the
function
keyword if declared in-line. When using stand-alone*.js
files, you can have a comment before thefunction
keyword, and you can usefn
as the function name, so that your IDE does not complain about JavaScript syntax errors, e.g.function fn(x){ return x + 1 }
* def greeter = function(title, name) { return 'hello ' + title + ' ' + name }
* assert greeter('Mr.', 'Bob') == 'hello Mr. Bob'
When JavaScript executes in Karate, the built-in
karate
object provides some commonly used utility functions. And with Karate expressions, you can "dive into" JavaScript without needing to define a function - and conditional logic is a good example.
For more complex functions you are better off using the multi-line 'doc-string' approach. This example actually calls into existing Java code, and being able to do this opens up a whole lot of possibilities. The JavaScript interpreter will try to convert types across Java and JavaScript as smartly as possible. For e.g. JSON objects become Java Map
-s, JSON arrays become Java List
-s, and Java Bean properties are accessible (and update-able) using 'dot notation' e.g. 'object.name
'
* def dateStringToLong =
"""
function(s) {
var SimpleDateFormat = Java.type('java.text.SimpleDateFormat');
var sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
return sdf.parse(s).time; // '.getTime()' would also have worked instead of '.time'
}
"""
* assert dateStringToLong("2016-12-24T03:39:21.081+0000") == 1482550761081
More examples of Java interop and how to invoke custom code can be found in the section on Calling Java.
The call
keyword provides an alternate way of calling JavaScript functions that have only one argument. The argument can be provided after the function name, without parentheses, which makes things slightly more readable (and less cluttered) especially when the solitary argument is JSON.
* def timeLong = call dateStringToLong '2016-12-24T03:39:21.081+0000'
* assert timeLong == 1482550761081
# a better example, with a JSON argument
* def greeter = function(name){ return 'Hello ' + name.first + ' ' + name.last + '!' }
* def greeting = call greeter { first: 'John', last: 'Smith' }
Karate makes re-use of payload data, utility-functions and even other test-scripts as easy as possible. Teams typically define complicated JSON (or XML) payloads in a file and then re-use this in multiple scripts. Keywords such as set
and remove
allow you to to 'tweak' payload-data to fit the scenario under test. You can imagine how this greatly simplifies setting up tests for boundary conditions. And such re-use makes it easier to re-factor tests when needed, which is great for maintainability.
Note that the
set
(multiple) keyword can build complex, nested JSON (or XML) from scratch in a data-driven manner, and you may not even need to read from files for many situations. Test data can be within the main flow itself, which makes scripts highly readable.
Reading files is achieved using the built-in JavaScript function called read()
. By default, the file is expected to be in the same folder (package) and side-by-side with the *.feature
file. But you can prefix the name with classpath:
in which case the 'root' folder would be src/test/java
(assuming you are using the recommended folder structure).
Prefer classpath:
when a file is expected to be heavily re-used all across your project. And yes, relative paths will work.
# json
* def someJson = read('some-json.json')
* def moreJson = read('classpath:more-json.json')
# xml
* def someXml = read('../common/my-xml.xml')
# import yaml (will be converted to json)
* def jsonFromYaml = read('some-data.yaml')
# csv (will be converted to json)
* def jsonFromCsv = read('some-data.csv')
# string
* def someString = read('classpath:messages.txt')
# javascript (will be evaluated)
* def someValue = read('some-js-code.js')
# if the js file evaluates to a function, it can be re-used later using the 'call' keyword
* def someFunction = read('classpath:some-reusable-code.js')
* def someCallResult = call someFunction
# the following short-cut is also allowed
* def someCallResult = call read('some-js-code.js')
You can also re-use other *.feature
files from test-scripts:
# perfect for all those common authentication or 'set up' flows
* def result = call read('classpath:some-reusable-steps.feature')
When a called feature depends on some side-by-side resources such as JSON or JS files, you can use the this:
prefix to ensure that relative paths work correctly - because by default Karate calculates relative paths from the "root" feature or the top-most "caller".
* def data = read('this:payload.json')
If a file does not end in .json
, .xml
, .yaml
, .js
, .csv
or .txt
, it is treated as a stream - which is typically what you would need for multipart
file uploads.
* def someStream = read('some-pdf.pdf')
The
.graphql
and.gql
extensions are also recognized (for GraphQL) but are handled the same way as.txt
and treated as a string.
For JSON and XML files, Karate will evaluate any embedded expressions on load. This enables more concise tests, and the file can be re-usable in multiple, data-driven tests.
Since it is internally implemented as a JavaScript function, you can mix calls to read()
freely wherever JavaScript expressions are allowed:
* def someBigString = read('first.txt') + read('second.txt')
Tip: you can even use JS expressions to dynamically choose a file based on some condition:
* def someConfig = read('my-config-' + someVariable + '.json')
. Refer to conditional logic for more ideas.
And a very common need would be to use a file as the request
body:
Given request read('some-big-payload.json')
Or in a match
:
And match response == read('expected-response-payload.json')
The rarely used file:
prefix is also supported. You could use it for 'hard-coded' absolute paths in dev mode, but is obviously not recommended for CI test-suites. A good example of where you may need this is if you programmatically write a file to the target
folder, and then you can read it like this:
* def payload = read('file:target/large.xml')
To summarize the possible prefixes:
Prefix | Description |
---|---|
classpath: | relative to the classpath, recommended for re-usable features |
file: | do not use this unless you know what you are doing, see above |
this: | when in a called feature, ensure that files are resolved relative to the current feature file |
Take a look at the Karate Demos for real-life examples of how you can use files for validating HTTP responses, like this one: read-files.feature
.
In some rare cases where you don't want to auto-convert JSON, XML, YAML or CSV, and just get the raw string content (without having to re-name the file to end with .txt
) - you can use the karate.readAsString()
API. Here is an example of using a CSV file as the request-body:
Given path 'upload'
And header Content-Type = 'text/csv'
And request karate.readAsString('classpath:my.csv')
When method post
Then status 202
Best practice is to stick to using only
def
unless there is a very good reason to do otherwise.
Internally, Karate will auto-convert JSON (and even XML) to Java Map
objects. And JSON arrays would become Java List
-s. But you will never need to worry about this internal data-representation most of the time.
In some rare cases, for e.g. if you acquired a string from some external source, or if you generated JSON (or XML) by concatenating text or using replace
, you may want to convert a string to JSON and vice-versa. You can even perform a conversion from XML to JSON if you want.
One example of when you may want to convert JSON (or XML) to a string is when you are passing a payload to custom code via Java interop. Do note that when passing JSON, the default Map
and List
representations should suffice for most needs (see example), and using them would avoid un-necessary string-conversion.
So you have the following type markers you can use instead of def
(or the rarely used text
). The first four below are best explained in this example file: type-conv.feature
.
string
- convert JSON or any other data-type (except XML) to a stringjson
- convert XML, a map-like or list-like object, a string, or even a Java object into JSONxml
- convert JSON, a map-like object, a string, or even a Java object into XMLxmlstring
- specifically for converting the map-like Karate internal representation of XML into a stringcsv
- convert a CSV string into JSON, see csv
yaml
- convert a YAML string into JSON, see yaml
bytes
- convert to a byte-array, useful for binary payloads or comparisons, see examplecopy
- to clone a given payload variable reference (JSON, XML, Map or List), refer: copy
The csv
and yaml
types can be initialized in-line using the "triple quote" or "docstring" multi-line approach as shown here.
If you want to 'pretty print' a JSON or XML value with indenting, refer to the documentation of the print
keyword.
While converting a number to a string is easy (just concatenate an empty string e.g. myInt + ''
), in some rare cases, you may need to convert a string to a number. You can do this by multiplying by 1
or using the built-in JavaScript parseInt()
function:
* def foo = '10'
* string json = { bar: '#(1 * foo)' }
* match json == '{"bar":10.0}'
* string json = { bar: '#(parseInt(foo))' }
* match json == '{"bar":10.0}'
As per the JSON spec, all numeric values are treated as doubles, so for integers - it really doesn't matter if there is a decimal point or not. In fact it may be a good idea to slip doubles instead of integers into some of your tests ! Anyway, there are times when you may want to force integers (perhaps for cosmetic reasons) and you can easily do so using the 'double-tilde' short-cut: '~~
'.
* def foo = '10'
* string json = { bar: '#(~~foo)' }
* match json == '{"bar":10}'
# JS math can introduce a decimal point in some cases
* def foo = 100
* string json = { bar: '#(foo * 0.1)' }
* match json == '{"bar":10.0}'
# but you can easily coerce to an integer if needed
* string json = { bar: '#(~~(foo * 0.1))' }
* match json == '{"bar":10}'
Sometimes when dealing with very large numbers, the JS engine may mangle the number into scientific notation:
* def big = 123123123123
* string json = { num: '#(big)' }
* match json == '{"num":1.23123123123E11}'
This can be easily solved by using java.math.BigDecimal
:
* def big = new java.math.BigDecimal(123123123123)
* string json = { num: '#(big)' }
* match json == '{"num":123123123123}'
doc
Karate has a built-in HTML templating engine that can be used to insert additional custom HTML into the test-reports. Here is an example:
* url 'https://jsonplaceholder.typicode.com/users'
* method get
* doc { read: 'users.html' }
Any Karate variable will be available to the template, which is users.html
in this example.
<table class="table table-striped">
<thead>
<tr>
<th>ID</th>
<th>Name</th>
<th>E-Mail</th>
</tr>
</thead>
<tbody>
<tr th:each="user: response">
<td th:text="user.id"></td>
<td th:text="user.name"></td>
<td th:text="user.email"></td>
</tr>
</tbody>
</table>
You can see what the result looks like here.
Since templates can be loaded using the classpath:
prefix, you can even re-use templates across your projects via Java JAR files.
Karate Expressions
Before we get to the HTTP keywords, it is worth doing a recap of the various 'shapes' that the right-hand-side of an assignment statement can take:
Example | Shape | Description |
---|---|---|
* def foo = 'bar' | JS | simple strings, numbers or booleans |
* def foo = 'bar' + baz[0] | JS | any valid JavaScript expression, and variables can be mixed in, another example: bar.length + 1 |
* def foo = { bar: '#(baz)' } | JSON | anything that starts with a { or a [ is parsed as JSON, use text instead of def if you need to suppress the default behavior |
* def foo = ({ bar: baz }) | JS | enclosed JavaScript, the result of which is exactly equivalent to the above |
* def foo = <foo>bar</foo> | XML | anything that starts with a < is parsed as XML, use text instead of def if you need to suppress the default behavior |
* def foo = function(arg){ return arg + bar } | JS Fn | anything that starts with function(...){ is parsed as a JS function. |
* def foo = read('bar.json') | JS | using the built-in read() function |
* def foo = $.bar[0] | JsonPath | short-cut JsonPath on the response |
* def foo = /bar/baz | XPath | short-cut XPath on the response |
* def foo = get bar $..baz[?(@.ban)] | get JsonPath | JsonPath on the variable bar , you can also use get[0] to get the first item if the JsonPath evaluates to an array - especially useful when using wildcards such as [*] or filter-criteria |
* def foo = $bar..baz[?(@.ban)] | $var.JsonPath | convenience short-cut for the above |
* def foo = get bar count(/baz//ban) | get XPath | XPath on the variable bar |
* def foo = karate.pretty(bar) | JS | using the built-in karate object in JS expressions |
* def Foo = Java.type('com.mycompany.Foo') | JS-Java | Java Interop, and even package-name-spaced one-liners like java.lang.System.currentTimeMillis() are possible |
* def foo = call bar { baz: '#(ban)' } | call | or callonce , where expressions like read('foo.js') are allowed as the object to be called or the argument |
* def foo = bar({ baz: ban }) | JS | equivalent to the above, JavaScript function invocation |
Core Keywords
They are url
, path
, request
, method
and status
.
These are essential HTTP operations, they focus on setting one (un-named or 'key-less') value at a time and therefore don't need an =
sign in the syntax.
url
Given url 'https://myhost.com/v1/cats'
A URL remains constant until you use the url
keyword again, so this is a good place to set-up the 'non-changing' parts of your REST URL-s.
A URL can take expressions, so the approach below is legal. And yes, variables can come from global config.
Given url 'https://' + e2eHostName + '/v1/api'
If you are trying to build dynamic URLs including query-string parameters in the form: http://myhost/some/path?foo=bar&search=true
- please refer to the param
keyword.
path
REST-style path parameters. Can be expressions that will be evaluated. Comma delimited values are supported which can be more convenient, and takes care of URL-encoding and appending '/' between path segments as needed.
Given path 'documents', documentId, 'download'
# or you can do the same on multiple lines if you wish
Given path 'documents'
And path documentId
And path 'download'
Note that the path
'resets' after any HTTP request is made but not the url
. The Hello World is a great example of 'REST-ful' use of the url
when the test focuses on a single REST 'resource'. Look at how the path
did not need to be specified for the second HTTP get
call since /cats
is part of the url
.
Important: If you attempt to build a URL in the form
?myparam=value
by usingpath
the?
will get encoded into%3F
. Use either theparam
keyword, e.g.:* param myparam = 'value'
orurl
:* url 'http://example.com/v1?myparam'
Because Karate strips trailing slashes if part of a path
parameter, if you want to append a forward-slash to the end of the URL in the final HTTP request - make sure that the last path
is a single '/'.
Given path 'documents', documentId, '/'
request
In-line JSON:
Given request { name: 'Billie', type: 'LOL' }
In-line XML:
And request <cat><name>Billie</name><type>Ceiling</type></cat>
From a file in the same package. Use the classpath:
prefix to load from the classpath instead.
Given request read('my-json.json')
You could always use a variable:
And request myVariable
In most cases you won't need to set the Content-Type
header
as Karate will automatically do the right thing depending on the data-type of the request
.
Defining the request
is mandatory if you are using an HTTP method
that expects a body such as post
. If you really need to have an empty body, you can use an empty string as shown below, and you can force the right Content-Type
header by using the header
keyword.
Given request ''
And header Content-Type = 'text/html'
Sending a file as the entire binary request body is easy (note that multipart
is different):
Given path 'upload'
And request read('my-image.jpg')
When method put
Then status 200
method
The HTTP verb - get
, post
, put
, delete
, patch
, options
, head
, connect
, trace
.
Lower-case is fine.
When method post
It is worth internalizing that during test-execution, it is upon the method
keyword that the actual HTTP request is issued. Which suggests that the step should be in the When
form, for example: When method post
. And steps that follow should logically be in the Then
form. Also make sure that you complete the set up of things like url
, param
, header
, configure
etc. before you fire the method
.
# set headers or params (if any) BEFORE the method step
Given header Accept = 'application/json'
When method get
# the step that immediately follows the above would typically be:
Then status 200
Although rarely needed, variable references or expressions are also supported:
* def putOrPost = (someVariable == 'dev' ? 'put' : 'post')
* method putOrPost
status
This is a shortcut to assert the HTTP response code.
Then status 200
And this assertion will cause the test to fail if the HTTP response code is something else.
See also responseStatus
if you want to do some complex assertions against the HTTP status code.
Keywords that set key-value pairs
They are param
, header
, cookie
, form field
and multipart field
.
The syntax will include a '=' sign between the key and the value. The key should not be within quotes.
To make dynamic data-driven testing easier, the following keywords also exist:
params
,headers
,cookies
andform fields
. They use JSON to build the relevant parts of the HTTP request.
param
Setting query-string parameters:
Given param someKey = 'hello'
And param anotherKey = someVariable
The above would result in a URL like: http://myhost/mypath?someKey=hello&anotherKey=foo
. Note that the ?
and &
will be automatically inserted.
Multi-value params are also supported:
* param myParam = ['foo', 'bar']
For convenience, a null
value will be ignored. You can also use JSON to set multiple query-parameters in one-line using params
and this is especially useful for dynamic data-driven testing.
header
You can use functions or expressions:
Given header Authorization = myAuthFunction()
And header transaction-id = 'test-' + myIdString
It is worth repeating that in most cases you won't need to set the Content-Type
header as Karate will automatically do the right thing depending on the data-type of the request
.
Because of how easy it is to set HTTP headers, Karate does not provide any special keywords for things like the Accept
header. You simply do something like this:
Given path 'some/path'
And request { some: 'data' }
And header Accept = 'application/json'
When method post
Then status 200
A common need is to send the same header(s) for every request, and configure headers
(with JSON) is how you can set this up once for all subsequent requests. And if you do this within a Background:
section, it would apply to all Scenario:
sections within the *.feature
file.
* configure headers = { 'Content-Type': 'application/xml' }
Note that Content-Type
had to be enclosed in quotes in the JSON above because the "-
" (hyphen character) would cause problems otherwise. Also note that "; charset=UTF-8
" would be appended to the Content-Type
header that Karate sends by default, and in some rare cases, you may need to suppress this behavior completely. You can do so by setting the charset
to null via the configure
keyword:
* configure charset = null
If you need headers to be dynamically generated for each HTTP request, use a JavaScript function with configure headers
instead of JSON.
Multi-value headers (though rarely used in the wild) are also supported:
* header myHeader = ['foo', 'bar']
Also look at the headers
keyword which uses JSON and makes some kinds of dynamic data-driven testing easier.
cookie
Setting a cookie:
Given cookie foo = 'bar'
You also have the option of setting multiple cookies in one-step using the cookies
keyword.
Note that any cookies returned in the HTTP response would be automatically set for any future requests. This mechanism works by calling configure cookies
behind the scenes and if you need to stop auto-adding cookies for future requests, just do this:
* configure cookies = null
Also refer to the built-in variable responseCookies
for how you can access and perform assertions on cookie data values.
form field
HTML form fields would be URL-encoded when the HTTP request is submitted (by the method
step). You would typically use these to simulate a user sign-in and then grab a security token from the response
.
Note that the Content-Type
header will be automatically set to: application/x-www-form-urlencoded
. You just need to do a normal POST
(or GET
).
For example:
Given path 'login'
And form field username = 'john'
And form field password = 'secret'
When method post
Then status 200
And def authToken = response.token
A good example of the use of form field
for a typical sign-in flow is this OAuth 2 demo: oauth2.feature
.
Multi-values are supported the way you would expect (e.g. for simulating check-boxes and multi-selects):
* form field selected = ['apple', 'orange']
You can also dynamically set multiple fields in one step using the form fields
keyword.
multipart field
Use this for building multipart named (form) field requests. This is typically combined with multipart file
as shown below.
Multiple fields can be set in one step using
multipart fields
.
multipart file
Given multipart file myFile = { read: 'test.pdf', filename: 'upload-name.pdf', contentType: 'application/pdf' }
And multipart field message = 'hello world'
When method post
Then status 200
It is important to note that myFile
above is the "field name" within the multipart/form-data
request payload. This roughly corresponds to a cURL
argument of -F @myFile=test.pdf
.
multipart
file uploads can be tricky, and hard to get right. If you get stuck and ask a question on Stack Overflow, make sure you provide acURL
command that works - or else it would be very difficult for anyone to troubleshoot what you could be doing wrong. Also see this thread.
Also note that multipart file
takes a JSON argument so that you can easily set the filename
and the contentType
(mime-type) in one step.
read
: the name of a file, and the classpath:
prefix also is allowed. mandatory unless value
is used, see below.value
: alternative to read
in rare cases where something like a JSON or XML file is being uploaded and you want to create it dynamically.filename
: optional, if not specified there will be no filename
attribute in Content-Disposition
contentType
: optional, will default to application/octet-stream
When 'multipart' content is involved, the Content-Type
header of the HTTP request defaults to multipart/form-data
. You can over-ride it by using the header
keyword before the method
step. Look at multipart entity
for an example.
Also refer to this demo example for a working example of multipart file uploads: upload.feature
.
You can also dynamically set multiple files in one step using multipart files
.
multipart entity
This is technically not in the key-value form:
multipart field name = 'foo'
, but logically belongs here in the documentation.
Use this for multipart content items that don't have field-names. Here below is an example that also demonstrates using the multipart/related
content-type.
Given path 'v2', 'documents'
And multipart entity read('foo.json')
And multipart field image = read('bar.jpg')
And header Content-Type = 'multipart/related'
When method post
Then status 201
Multi-Param Keywords
params
, headers
, cookies
, form fields
, multipart fields
and multipart files
take a single JSON argument (which can be in-line or a variable reference), and this enables certain types of dynamic data-driven testing, especially because any JSON key with a null
value will be ignored. Here is a good example in the demos: dynamic-params.feature
params
* params { searchBy: 'client', active: true, someList: [1, 2, 3] }
See also param
.
headers
* def someData = { Authorization: 'sometoken', tx_id: '1234', extraTokens: ['abc', 'def'] }
* headers someData
See also header
.
cookies
* cookies { someKey: 'someValue', foo: 'bar' }
See also cookie
.
form fields
* def credentials = { username: '#(user.name)', password: 'secret', projects: ['one', 'two'] }
* form fields credentials
See also form field
.
multipart fields
And multipart fields { message: 'hello world', json: { foo: 'bar' } }
See also multipart field
.
multipart files
The single JSON argument needs to be in the form { field1: { read: 'file1.ext' }, field2: { read: 'file2.ext' } }
where each nested JSON is in the form expected by multipart file
* def json = {}
* set json.myFile1 = { read: 'test1.pdf', filename: 'upload-name1.pdf', contentType: 'application/pdf' }
# if you have dynamic keys you can do this
* def key = 'myFile2'
* json[key] = { read: 'test2.pdf', filename: 'upload-name2.pdf', contentType: 'application/pdf' }
And multipart files json
SOAP
Since a SOAP request needs special handling, this is the only case where the method
step is not used to actually fire the request to the server.
soap action
The name of the SOAP action specified is used as the 'SOAPAction' header. Here is an example which also demonstrates how you could assert for expected values in the response XML.
Given request read('soap-request.xml')
When soap action 'QueryUsageBalance'
Then status 200
And match response /Envelope/Body/QueryUsageBalanceResponse/Result/Error/Code == 'DAT_USAGE_1003'
And match response /Envelope/Body/QueryUsageBalanceResponse == read('expected-response.xml')
A working example of calling a SOAP service can be found within the Karate project test-suite. Refer to the demos for another example: soap.feature
.
More examples are available that showcase various ways of parameter-izing and dynamically manipulating SOAP requests in a data-driven fashion. Karate is quite flexible, and provides multiple options for you to evolve patterns that fit your environment, as you can see here: xml.feature
.
retry until
Karate has built-in support for re-trying an HTTP request until a certain condition has been met. The default setting for the max retry-attempts is 3 with a poll interval of 3000 milliseconds (3 seconds). If needed, this can be changed by using configure
- any time during a test, or set globally via karate-config.js
* configure retry = { count: 10, interval: 5000 }
The retry
keyword is designed to extend the existing method
syntax (and should appear before a method
step) like so:
Given url demoBaseUrl
And path 'greeting'
And retry until response.id > 3
When method get
Then status 200
Any JavaScript expression that uses any variable in scope can be placed after the "retry until
" part. So you can refer to the response
, responseStatus
or even responseHeaders
if needed. For example:
Given url demoBaseUrl
And path 'greeting'
And retry until responseStatus == 200 && response.id > 3
When method get
Note that it has to be a pure JavaScript expression - which means that
match
syntax such ascontains
will not work. But you can easily achieve any complex logic by using the JS API.
Refer to polling.feature
for an example, and also see the alternative way to achieve polling.
configure
You can adjust configuration settings for the HTTP client used by Karate using this keyword. The syntax is similar to def
but instead of a named variable, you update configuration. Here are the configuration keys supported:
Key | Type | Description |
---|---|---|
headers | JSON / JS function | See configure headers |
cookies | JSON / JS function | Just like configure headers , but for cookies. You will typically never use this, as response cookies are auto-added to all future requests. If you need to clear cookies at any time, just do configure cookies = null |
logPrettyRequest | boolean | Pretty print the request payload JSON or XML with indenting (default false ) |
logPrettyResponse | boolean | Pretty print the response payload JSON or XML with indenting (default false ) |
printEnabled | boolean | Can be used to suppress the print output when not in 'dev mode' by setting as false (default true ) |
report | JSON / boolean | see report verbosity |
afterScenario | JS function | Will be called after every Scenario (or Example within a Scenario Outline ), refer to this example: hooks.feature |
afterFeature | JS function | Will be called after every Feature , refer to this example: hooks.feature |
ssl | boolean | Enable HTTPS calls without needing to configure a trusted certificate or key-store. |
ssl | string | Like above, but force the SSL algorithm to one of these values. (The above form internally defaults to TLS if simply set to true ). |
ssl | JSON | see X509 certificate authentication |
followRedirects | boolean | Whether the HTTP client automatically follows redirects - (default true ), refer to this example. |
connectTimeout | integer | Set the connect timeout (milliseconds). The default is 30000 (30 seconds). Note that for karate-apache , this sets the socket timeout to the same value as well. |
readTimeout | integer | Set the read timeout (milliseconds). The default is 30000 (30 seconds). |
proxy | string | Set the URI of the HTTP proxy to use. |
proxy | JSON | For a proxy that requires authentication, set the uri , username and password , see example below. Also a nonProxyHosts key is supported which can take a list for e.g. { uri: 'http://my.proxy.host:8080', nonProxyHosts: ['host1', 'host2']} |
localAddress | string | see karate-gatling |
charset | string | The charset that will be sent in the request Content-Type which defaults to utf-8 . You typically never need to change this, and you can over-ride (or disable) this per-request if needed via the header keyword (example). |
retry | JSON | defaults to { count: 3, interval: 3000 } - see retry until |
callSingleCache | JSON | defaults to { minutes: 0, dir: 'target' } - see configure callSingleCache |
lowerCaseResponseHeaders | boolean | Converts every key in the responseHeaders to lower-case which makes it easier to validate or re-use |
abortedStepsShouldPass | boolean | defaults to false , whether steps after a karate.abort() should be marked as PASSED instead of SKIPPED - this can impact the behavior of 3rd-party reports, see this issue for details |
logModifier | Java Object | See Log Masking |
responseHeaders | JSON / JS function | See karate-netty |
cors | boolean | See karate-netty |
driver | JSON | See UI Automation |
driverTarget | JSON / Java Object | See configure driverTarget |
pauseIfNotPerf | boolean | defaults to false , relevant only for performance-testing, see karate.pause() and karate-gatling |
xmlNamespaceAware | boolean | defaults to false , to handle XML namespaces in some special circumstances |
Examples:
# pretty print the response payload
* configure logPrettyResponse = true
# enable ssl (and no certificate is required)
* configure ssl = true
# enable ssl and force the algorithm to TLSv1.2
* configure ssl = 'TLSv1.2'
# time-out if the response is not received within 10 seconds (after the connection is established)
* configure readTimeout = 10000
# set the uri of the http proxy server to use
* configure proxy = 'http://my.proxy.host:8080'
# proxy which needs authentication
* configure proxy = { uri: 'http://my.proxy.host:8080', username: 'john', password: 'secret' }
configure
globallyIf you need to set any of these "globally" you can easily do so using the karate
object in karate-config.js
- for e.g:
karate.configure('ssl', true);
karate.configure('readTimeout', 5000);
In rare cases where you need to add nested non-JSON data to the configure
value, you have to play by the rules that apply within karate-config.js
. Here is an example of performing a configure driver
step in JavaScript:
var LM = Java.type('com.mycompany.MyHttpLogModifier');
var driverConfig = { type:'chromedriver', start: false, webDriverUrl:'https://user:password@zalenium.net/wd/hub' };
driverConfig.httpConfig = karate.toMap({ logModifier: LM.INSTANCE });
karate.configure('driver', driverConfig);
By default, Karate will add logs to the report output so that HTTP requests and responses appear in-line in the HTML reports. There may be cases where you want to suppress this to make the reports "lighter" and easier to read.
The configure key here is report
and it takes a JSON value. For example:
* configure report = { showLog: true, showAllSteps: false }
report | Type | Description |
---|---|---|
showLog | boolean | HTTP requests and responses (including headers) will appear in the HTML report, default true |
showAllSteps | boolean | If false , any step that starts with * instead of Given , When , Then etc. will not appear in the HTML report. The print step is an exception. Default true . |
You can 'reset' default settings by using the following short-cut:
# reset to defaults
* configure report = true
Since you can use configure
any time within a test, you have control over which requests or steps you want to show / hide. This can be convenient if a particular call results in a huge response payload.
The following short-cut is also supported which will disable all logs:
* configure report = false
@report=false
When you use a re-usable feature that has commonly used utilities, you may want to hide this completely from the HTML reports. The special tag @report=false
can be used, and it can even be used only for a single Scenario
:
@ignore @report=false
Feature:
Scenario:
# some re-usable steps
In cases where you want to "mask" values which are sensitive from a security point of view from the output files, logs and HTML reports, you can implement the HttpLogModifier
and tell Karate to use it via the configure
keyword. Here is an example of an implementation. For performance reasons, you can implement enableForUri()
so that this "activates" only for some URL patterns.
Instantiating a Java class and using this in a test is easy (see example):
# if this was in karate-config.js, it would apply "globally"
* def LM = Java.type('demo.headers.DemoLogModifier')
* configure logModifier = new LM()
Or globally in karate-config.js
var LM = Java.type('demo.headers.DemoLogModifier');
karate.configure('logModifier', new LM());
Since karate-config.js
is processed for every Scenario
, you can use a singleton instead of calling new
every time. Something like this:
var LM = Java.type('demo.headers.DemoLogModifier');
karate.configure('logModifier', LM.INSTANCE);
For HTTPS / SSL, you can also specify a custom certificate or trust store by setting Java system properties. And similarly - for specifying the HTTP proxy.
Also referred to as "mutual auth" - if your API requires that clients present an X509 certificate for authentication, Karate supports this via JSON as the configure ssl
value. The following parameters are supported:
Key | Type | Required? | Description |
---|---|---|---|
keyStore | string | optional | path to file containing public and private keys for your client certificate. |
keyStorePassword | string | optional | password for keyStore file. |
keyStoreType | string | optional | Format of the keyStore file. Allowed keystore types are as described in the Java KeyStore docs. |
trustStore | string | optional | path to file containing the trust chain for your server certificate. |
trustStorePassword | string | optional | password for trustStore file. |
trustStoreType | string | optional | Format of the trustStore file. Allowed keystore types are as described in the Java KeyStore docs. |
trustAll | boolean | optional | if all server certificates should be considered trusted. Default value is false . If true will allow self-signed certificates. If false , will expect the whole chain in the trustStore or use what is available in the environment. |
algorithm | string | optional | force the SSL algorithm to one of these values. Default is TLS . |
Example:
# enable X509 certificate authentication with PKCS12 file 'certstore.pfx' and password 'certpassword'
* configure ssl = { keyStore: 'classpath:certstore.pfx', keyStorePassword: 'certpassword', keyStoreType: 'pkcs12' }
# trust all server certificates, in the feature file
* configure ssl = { trustAll: true }
// trust all server certificates, global configuration in 'karate-config.js'
karate.configure('ssl', { trustAll: true });
For end-to-end examples in the Karate demos, look at the files in this folder.
Payload Assertions
Now it should be clear how Karate makes it easy to express JSON or XML. If you read from a file, the advantage is that multiple scripts can re-use the same data.
Once you have a JSON or XML object, Karate provides multiple ways to manipulate, extract or transform data. And you can easily assert that the data is as expected by comparing it with another JSON or XML object.
match
The match
operation is smart because white-space does not matter, and the order of keys (or data elements) does not matter. Karate is even able to ignore fields you choose - which is very useful when you want to handle server-side dynamically generated fields such as UUID-s, time-stamps, security-tokens and the like.
The match syntax involves a double-equals sign '==' to represent a comparison (and not an assignment '=').
Since match
and set
go well together, they are both introduced in the examples in the section below.
set
Game, set
and match
- Karate !
Before you consider the set
keyword - note that for simple JSON update operations, you can use eval
- especially useful when the path you are trying to mutate is dynamic. Since the eval
keyword can be omitted when operating on variables using JavaScript, this leads to very concise code:
* def myJson = { a: '1' }
* myJson.b = 2
* match myJson == { a: '1', b: 2 }
Refer to eval
for more / advanced examples.
Setting values on JSON documents is simple using the set
keyword.
* def myJson = { foo: 'bar' }
* set myJson.foo = 'world'
* match myJson == { foo: 'world' }
# add new keys. you can use pure JsonPath expressions (notice how this is different from the above)
* set myJson $.hey = 'ho'
* match myJson == { foo: 'world', hey: 'ho' }
# and even append to json arrays (or create them automatically)
* set myJson.zee[0] = 5
* match myJson == { foo: 'world', hey: 'ho', zee: [5] }
# omit the array index to append
* set myJson.zee[] = 6
* match myJson == { foo: 'world', hey: 'ho', zee: [5, 6] }
# nested json ? no problem
* set myJson.cat = { name: 'Billie' }
* match myJson == { foo: 'world', hey: 'ho', zee: [5, 6], cat: { name: 'Billie' } }
# and for match - the order of keys does not matter
* match myJson == { cat: { name: 'Billie' }, hey: 'ho', foo: 'world', zee: [5, 6] }
# you can ignore fields marked with '#ignore'
* match myJson == { cat: '#ignore', hey: 'ho', foo: 'world', zee: [5, 6] }
XML and XPath works just like you'd expect.
* def cat = <cat><name>Billie</name></cat>
* set cat /cat/name = 'Jean'
* match cat / == <cat><name>Jean</name></cat>
# you can even set whole fragments of xml
* def xml = <foo><bar>baz</bar></foo>
* set xml/foo/bar = <hello>world</hello>
* match xml == <foo><bar><hello>world</hello></bar></foo>
Refer to the section on XPath Functions for examples of advanced XPath usage.
match
and variablesIn case you were wondering, variables (and even expressions) are supported on the right-hand-side. So you can compare 2 JSON (or XML) payloads if you wanted to:
* def foo = { hello: 'world', baz: 'ban' }
* def bar = { baz: 'ban', hello: 'world' }
* match foo == bar
If you are wondering about the finer details of the match
syntax, the Left-Hand-Side has to be either a
foo
foo[0].bar
or foo[*].bar
foo.bar()
or foo.bar('hello').baz
(foo + bar)
or (42)
- and in this case, variables can be usedAnd the right-hand-side can be any valid Karate expression. Refer to the section on JsonPath short-cuts for a deeper understanding of 'named' JsonPath expressions in Karate.
match !=
(not equals)The 'not equals' operator !=
works as you would expect:
* def test = { foo: 'bar' }
* match test != { foo: 'baz' }
You typically will never need to use the
!=
(not-equals) operator ! Use it sparingly, and only for string, number or simple payload comparisons.
set
multipleKarate has an elegant way to set multiple keys (via path expressions) in one step. For convenience, non-existent keys (or array elements) will be created automatically. You can find more JSON examples here: js-arrays.feature
.
* def cat = { name: '' }
* set cat
| path | value |
| name | 'Bob' |
| age | 5 |
* match cat == { name: 'Bob', age: 5 }
One extra convenience for JSON is that if the variable itself (which was cat
in the above example) does not exist, it will be created automatically. You can even create (or modify existing) JSON arrays by using multiple columns.
* set foo
| path | 0 | 1 |
| bar | 'baz' | 'ban' |
* match foo == [{ bar: 'baz' }, { bar: 'ban' }]
If you have to set a bunch of deeply nested keys, you can move the parent path to the top, next to the set
keyword and save a lot of typing ! Note that this is not supported for "arrays" like above, and you can have only one value
column.
* set foo.bar
| path | value |
| one | 1 |
| two[0] | 2 |
| two[1] | 3 |
* match foo == { bar: { one: 1, two: [2, 3] } }
The same concept applies to XML and you can build complicated payloads from scratch in just a few, extremely readable lines. The value
column can take expressions, even XML chunks. You can find more examples here: xml.feature
.
* set search /acc:getAccountByPhoneNumber
| path | value |
| acc:phone/@foo | 'bar' |
| acc:phone/acc:number[1] | 1234 |
| acc:phone/acc:number[2] | 5678 |
| acc:phoneNumberSearchOption | 'all' |
* match search ==
"""
<acc:getAccountByPhoneNumber>
<acc:phone foo="bar">
<acc:number>1234</acc:number>
<acc:number>5678</acc:number>
</acc:phone>
<acc:phoneNumberSearchOption>all</acc:phoneNumberSearchOption>
</acc:getAccountByPhoneNumber>
"""
remove
This is like the opposite of set
if you need to remove keys or data elements from JSON or XML instances. You can even remove JSON array elements by index.
* def json = { foo: 'world', hey: 'ho', zee: [1, 2, 3] }
* remove json.hey
* match json == { foo: 'world', zee: [1, 2, 3] }
* remove json $.zee[1]
* match json == { foo: 'world', zee: [1, 3] }
remove
works for XML elements as well:
* def xml = <foo><bar><hello>world</hello></bar></foo>
* remove xml/foo/bar/hello
* match xml == <foo><bar/></foo>
* remove xml /foo/bar
* match xml == <foo/>
Also take a look at how a special case of embedded-expressions can remove key-value pairs from a JSON (or XML) payload: Remove if Null.
See also delete
, below.
delete
For JSON, you can also use the JS delete
operator via eval
, useful when the path you are trying to mutate is dynamic.
* def key = 'a'
* def foo = { a: 1 }
* eval delete foo[key]
As a convenience, you can omit the eval
:
* delete foo[key]
When expressing expected results (in JSON or XML) you can mark some fields to be ignored when the match (comparison) is performed. You can even use a regular-expression so that instead of checking for equality, Karate will just validate that the actual value conforms to the expected pattern.
This means that even when you have dynamic server-side generated values such as UUID-s and time-stamps appearing in the response, you can still assert that the full-payload matched in one step.
* def cat = { name: 'Billie', type: 'LOL', id: 'a9f7a56b-8d5c-455c-9d13-808461d17b91' }
* match cat == { name: '#ignore', type: '#regex [A-Z]{3}', id: '#uuid' }
# this will fail
# * match cat == { name: '#ignore', type: '#regex .{2}', id: '#uuid' }
Note that regex escaping has to be done with a double back-slash - for e.g:
'#regex a\\.dot'
will match'a.dot'
The supported markers are the following:
Marker | Description |
---|---|
#ignore | Skip comparison for this field even if the data element or JSON key is present |
#null | Expects actual value to be null , and the data element or JSON key must be present |
#notnull | Expects actual value to be not-null |
#present | Actual value can be any type or even null , but the key must be present (only for JSON / XML, see below) |
#notpresent | Expects the key to be not present at all (only for JSON / XML, see below) |
#array | Expects actual value to be a JSON array |
#object | Expects actual value to be a JSON object |
#boolean | Expects actual value to be a boolean true or false |
#number | Expects actual value to be a number |
#string | Expects actual value to be a string |
#uuid | Expects actual (string) value to conform to the UUID format |
#regex STR | Expects actual (string) value to match the regular-expression 'STR' (see examples above) |
#? EXPR | Expects the JavaScript expression 'EXPR' to evaluate to true, see self-validation expressions below |
#[NUM] EXPR | Advanced array validation, see schema validation |
#(EXPR) | For completeness, embedded expressions belong in this list as well |
Note that #present
and #notpresent
only make sense when you are matching within a JSON or XML context or using a JsonPath or XPath on the left-hand-side.
* def json = { foo: 'bar' }
* match json == { foo: '#present' }
* match json.nope == '#notpresent'
The rest can also be used even in 'primitive' data matches like so:
* match foo == '#string'
# convenient (and recommended) way to check for array length
* match bar == '#[2]'
If two cross-hatch #
symbols are used as the prefix (for example: ##number
), it means that the key is optional or that the value can be null.
* def foo = { bar: 'baz' }
* match foo == { bar: '#string', ban: '##string' }
A very useful behavior when you combine the optional marker with an embedded expression is as follows: if the embedded expression evaluates to null
- the JSON key (or XML element or attribute) will be deleted from the payload (the equivalent of remove
).
* def data = { a: 'hello', b: null, c: null }
* def json = { foo: '#(data.a)', bar: '#(data.b)', baz: '##(data.c)' }
* match json == { foo: 'hello', bar: null }
If you are just trying to pre-define schema snippets to use in a fuzzy-match, you can use enclosed Javascript to suppress the default behavior of replacing placeholders. For example:
* def dogSchema = { id: '#string', color: '#string' }
# here we enclose in round-brackets to preserve the optional embedded expression
# so that it can be used later in a "match"
* def schema = ({ id: '#string', name: '#string', dog: '##(dogSchema)' })
* def response1 = { id: '123', name: 'foo' }
* match response1 == schema
And if you need to suppress placeholder substitution for read()
, but still need a JSON snippet, you can do this. Note how we read as a string, but "cast" to JSON:
* json schema = karate.readAsString('schema.json')
If you want to use the triple-quote / multi-line way of defining JSON or if you have to use XML - you can use text
and "cast" to JSON or XML as a second step - before using in a match
:
* text schema =
"""
<root>
<a>#string</a>
<b>##(subSchema)</b>
</root>
"""
* xml schema = schema
#null
and #notpresent
Karate's match
is strict, and the case where a JSON key exists but has a null
value (#null
) is considered different from the case where the key is not present at all (#notpresent
) in the payload.
But note that ##null
can be used to represent a convention that many teams adopt, which is that keys with null
values are stripped from the JSON payload. In other words, { a: 1, b: null }
is considered 'equal' to { a: 1 }
and { a: 1, b: '##null' }
will match
both cases.
These examples (all exact matches) can make things more clear:
* def foo = { }
* match foo == { a: '##null' }
* match foo == { a: '##notnull' }
* match foo == { a: '#notpresent' }
* match foo == { a: '#ignore' }
* def foo = { a: null }
* match foo == { a: '#null' }
* match foo == { a: '##null' }
* match foo == { a: '#present' }
* match foo == { a: '#ignore' }
* def foo = { a: 1 }
* match foo == { a: '#notnull' }
* match foo == { a: '##notnull' }
* match foo == { a: '#present' }
* match foo == { a: '#ignore' }
Note that you can alternatively use JsonPath on the left-hand-side:
* def foo = { a: 1 }
* match foo.a == '#present'
* match foo.nope == '#notpresent'
But of course it is preferable to match whole objects in one step as far as possible.
The special 'predicate' marker #? EXPR
in the table above is an interesting one. It is best explained via examples. Any valid JavaScript expression that evaluates to a Truthy or Falsy value is expected after the #?
.
Observe how the value of the field being validated (or 'self') is injected into the 'underscore' expression variable: '_
'
* def date = { month: 3 }
* match date == { month: '#? _ > 0 && _ < 13' }
What is even more interesting is that expressions can refer to variables:
* def date = { month: 3 }
* def min = 1
* def max = 12
* match date == { month: '#? _ >= min && _ <= max' }
And functions work as well ! You can imagine how you could evolve a nice set of utilities that validate all your domain objects.
* def date = { month: 3 }
* def isValidMonth = function(m) { return m >= 0 && m <= 12 }
* match date == { month: '#? isValidMonth(_)' }
Especially since strings can be easily coerced to numbers (and vice-versa) in Javascript, you can combine built-in validators with the self-validation 'predicate' form like this: '#number? _ > 0'
# given this invalid input (string instead of number)
* def date = { month: '3' }
# this will pass
* match date == { month: '#? _ > 0' }
# but this 'combined form' will fail, which is what we want
# * match date == { month: '#number? _ > 0' }
You can actually refer to any JsonPath on the document via $
and perform cross-field or conditional validations ! This example uses contains
and the #?
'predicate' syntax, and situations where this comes in useful will be apparent when we discuss match each
.
Given def temperature = { celsius: 100, fahrenheit: 212 }
Then match temperature == { celsius: '#number', fahrenheit: '#? _ == $.celsius * 1.8 + 32' }
# when validation logic is an 'equality' check, an embedded expression works better
Then match temperature contains { fahrenheit: '#($.celsius * 1.8 + 32)' }
match
text or binary# when the response is plain-text
Then match response == 'Health Check OK'
And match response != 'Error'
# when the response is binary (byte-array)
Then match responseBytes == read('test.pdf')
# incidentally, match and assert behave exactly the same way for strings
* def hello = 'Hello World!'
* match hello == 'Hello World!'
* assert hello == 'Hello World!'
Checking if a string is contained within another string is a very common need and match
(name) contains
works just like you'd expect:
* def hello = 'Hello World!'
* match hello contains 'World'
* match hello !contains 'blah'
For case-insensitive string comparisons, see how to create custom utilities or karate.lowerCase()
. And for dealing with binary content - see bytes
.
match header
Since asserting against header values in the response is a common task - match header
has a special meaning. It short-cuts to the pre-defined variable responseHeaders
and reduces some complexity - because strictly, HTTP headers are a 'multi-valued map' or a 'map of lists' - the Java-speak equivalent being Map<String, List<String>>
. And since header names are case-insensitive - it ignores the case when finding the header to match.
# so after a http request
Then match header Content-Type == 'application/json'
# 'contains' works as well
Then match header Content-Type contains 'application'
Note the extra convenience where you don't have to enclose the LHS key in quotes.
You can always directly access the variable called responseHeaders
if you wanted to do more checks, but you typically won't need to.
match
and XMLAll the fuzzy matching markers will work in XML as well. Here are some examples:
* def xml = <root><hello>world</hello><foo>bar</foo></root>
* match xml == <root><hello>world</hello><foo>#ignore</foo></root>
* def xml = <root><hello foo="bar">world</hello></root>
* match xml == <root><hello foo="#ignore">world</hello></root>
Refer to this file for a comprehensive set of XML examples: xml.feature
.
match contains
In some cases where the response JSON is wildly dynamic, you may want to only check for the existence of some keys. And match
(name) contains
is how you can do so:
* def foo = { bar: 1, baz: 'hello', ban: 'world' }
* match foo contains { bar: 1 }
* match foo contains { baz: 'hello' }
* match foo contains { bar:1, baz: 'hello' }
# this will fail
# * match foo == { bar:1, baz: 'hello' }
Note that match contains
will not "recurse" any nested JSON chunks so use match contains deep
instead.
Also note that match contains any
is possible for JSON objects as well as JSON arrays.
!contains
It is sometimes useful to be able to check if a key-value-pair does not exist. This is possible by prefixing contains
with a !
(with no space in between).
* def foo = { bar: 1, baz: 'hello', ban: 'world' }
* match foo !contains { bar: 2 }
* match foo !contains { huh: '#notnull' }
Here's a reminder that the #notpresent
marker can be mixed into an equality match
(==
) to assert that some keys exist and at the same time ensure that some keys do not exist:
* def foo = { a: 1 }
* match foo == { a: '#number', b: '#notpresent' }
# if b can be present (optional) but should always be null
* match foo == { a: '#number', b: '##null' }
The !
(not) operator is especially useful for contains
and JSON arrays.
* def foo = [1, 2, 3]
* match foo !contains 4
* match foo !contains [5, 6]
This is a good time to deep-dive into JsonPath, which is perfect for slicing and dicing JSON into manageable chunks. It is worth taking a few minutes to go through the documentation and examples here: JsonPath Examples.
Here are some example assertions performed while scraping a list of child elements out of the JSON below. Observe how you can match
the result of a JsonPath expression with your expected data.
Given def cat =
"""
{
name: 'Billie',
kittens: [
{ id: 23, name: 'Bob' },
{ id: 42, name: 'Wild' }
]
}
"""
# normal 'equality' match. note the wildcard '*' in the JsonPath (returns an array)
Then match cat.kittens[*].id == [23, 42]
# when inspecting a json array, 'contains' just checks if the expected items exist
# and the size and order of the actual array does not matter
Then match cat.kittens[*].id contains 23
Then match cat.kittens[*].id contains [42]
Then match cat.kittens[*].id contains [23, 42]
Then match cat.kittens[*].id contains [42, 23]
# the .. operator is great because it matches nodes at any depth in the JSON "tree"
Then match cat..name == ['Billie', 'Bob', 'Wild']
# and yes, you can assert against nested objects within JSON arrays !
Then match cat.kittens contains [{ id: 42, name: 'Wild' }, { id: 23, name: 'Bob' }]
# ... and even ignore fields at the same time !
Then match cat.kittens contains { id: 42, name: '#string' }
It is worth mentioning that to do the equivalent of the last line in Java, you would typically have to traverse 2 Java Objects, one of which is within a list, and you would have to check for nulls as well.
When you use Karate, all your data assertions can be done in pure JSON and without needing a thick forest of companion Java objects. And when you read
your JSON objects from (re-usable) files, even complex response payload assertions can be accomplished in just a single line of Karate-script.
Refer to this case study for how dramatic the reduction of lines of code can be.
match contains only
For those cases where you need to assert that all array elements are present but in any order you can do this:
* def data = { foo: [1, 2, 3] }
* match data.foo contains 1
* match data.foo contains [2]
* match data.foo contains [3, 2]
* match data.foo contains only [3, 2, 1]
* match data.foo contains only [2, 3, 1]
# this will fail
# * match data.foo contains only [2, 3]
match contains any
To assert that any of the given array elements are present.
* def data = { foo: [1, 2, 3] }
* match data.foo contains any [9, 2, 8]
And this happens to work as expected for JSON object keys as well:
* def data = { a: 1, b: 'x' }
* match data contains any { b: 'x', c: true }
match contains deep
This modifies the behavior of match contains
so that nested lists or objects are processed for a "deep contains" match instead of a "deep equals" one which is the default. This is convenient for complex nested payloads where you are sure that you only want to check for some values in the various "trees" of data.
Here is an example:
Scenario: recurse nested json
* def original = { a: 1, b: 2, c: 3, d: { a: 1, b: 2 } }
* def expected = { a: 1, c: 3, d: { b: 2 } }
* match original contains deep expected
Scenario: recurse nested array
* def original = { a: 1, arr: [ { b: 2, c: 3 }, { b: 3, c: 4 } ] }
* def expected = { a: 1, arr: [ { b: 2 }, { c: 4 } ] }
* match original contains deep expected
the NOT operator e.g.
!contains deep
is not yet supported, please contribute code if you can.
match each
The match
keyword can be made to iterate over all elements in a JSON array using the each
modifier. Here's how it works:
* def data = { foo: [{ bar: 1, baz: 'a' }, { bar: 2, baz: 'b' }, { bar: 3, baz: 'c' }]}
* match each data.foo == { bar: '#number', baz: '#string' }
# and you can use 'contains' the way you'd expect
* match each data.foo contains { bar: '#number' }
* match each data.foo contains { bar: '#? _ != 4' }
# some more examples of validation macros
* match each data.foo contains { baz: "#? _ != 'z'" }
* def isAbc = function(x) { return x == 'a' || x == 'b' || x == 'c' }
* match each data.foo contains { baz: '#? isAbc(_)' }
# this is also possible, see the subtle difference from the above
* def isXabc = function(x) { return x.baz == 'a' || x.baz == 'b' || x.baz == 'c' }
* match each data.foo == '#? isXabc(_)'
Here is a contrived example that uses match each
, contains
and the #?
'predicate' marker to validate that the value of totalPrice
is always equal to the roomPrice
of the first item in the roomInformation
array.
Given def json =
"""
{
"hotels": [
{ "roomInformation": [{ "roomPrice": 618.4 }], "totalPrice": 618.4 },
{ "roomInformation": [{ "roomPrice": 679.79}], "totalPrice": 679.79 }
]
}
"""
Then match each json.hotels contains { totalPrice: '#? _ == _$.roomInformation[0].roomPrice' }
# when validation logic is an 'equality' check, an embedded expression works better
Then match each json.hotels contains { totalPrice: '#(_$.roomInformation[0].roomPrice)' }
While $
always refers to the JSON 'root', note the use of _$
above to represent the 'current' node of a match each
iteration. Here is a recap of symbols that can be used in JSON embedded expressions:
Symbol | Evaluates To |
---|---|
$ | The 'root' of the JSON document in scope |
_ | The value of 'self' |
_$ | The 'parent' of 'self' or 'current' item in the list, relevant when using match each |
There is a shortcut for match each
explained in the next section that can be quite useful, especially for 'in-line' schema-like validations.
Karate provides a far more simpler and more powerful way than JSON-schema to validate the structure of a given payload. You can even mix domain and conditional validations and perform all assertions in a single step.
But first, a special short-cut for array validation needs to be introduced:
* def foo = ['bar', 'baz']
# should be an array
* match foo == '#[]'
# should be an array of size 2
* match foo == '#[2]'
# should be an array of strings with size 2
* match foo == '#[2] #string'
# each array element should have a 'length' property with value 3
* match foo == '#[]? _.length == 3'
# should be an array of strings each of length 3
* match foo == '#[] #string? _.length == 3'
# should be null or an array of strings
* match foo == '##[] #string'
This 'in-line' short-cut for validating JSON arrays is similar to how match each
works. So now, complex payloads (that include arrays) can easily be validated in one step by combining validation markers like so:
* def oddSchema = { price: '#string', status: '#? _ < 3', ck: '##number', name: '#regex[0-9X]' }
* def isValidTime = read('time-validator.js')
When method get
Then match response ==
"""
{
id: '#regex[0-9]+',
count: '#number',
odd: '#(oddSchema)',
data: {
countryId: '#number',
countryName: '#string',
leagueName: '##string',
status: '#number? _ >= 0',
sportName: '#string',
time: '#? isValidTime(_)'
},
odds: '#[] oddSchema'
}
"""
Especially note the re-use of the oddSchema
both as an embedded-expression and as an array validation (on the last line).
And you can perform conditional / cross-field validations and even business-logic validations at the same time.
# optional (can be null) and if present should be an array of size greater than zero
* match $.odds == '##[_ > 0]'
# should be an array of size equal to $.count
* match $.odds == '#[$.count]'
# use a predicate function to validate each array element
* def isValidOdd = function(o){ return o.name.length == 1 }
* match $.odds == '#[]? isValidOdd(_)'
Refer to this for the complete example: schema-like.feature
And there is another example in the karate-demos: schema.feature
where you can compare Karate's approach with an actual JSON-schema example. You can also find a nice visual comparison and explanation here.
contains
short-cutsEspecially when payloads are complex (or highly dynamic), it may be more practical to use contains
semantics. Karate has the following short-cut symbols designed to be mixed into embedded expressions
:
Symbol | Means |
---|---|
^ | contains |
^^ | contains only |
^* | contains any |
^+ | contains deep |
!^ | not contains |
Here'a table of the alternative 'in-line' forms compared with the 'standard' form. Note that all the short-cut forms on the right-side of the table resolve to 'equality' (==
) matches, which enables them to be 'in-lined' into a full (single-step) payload match
, using embedded expressions.
A very useful capability is to be able to check that an array contains
an object that contains
the provided sub-set of keys instead of having to specify the complete JSON - which can get really cumbersome for large objects. This turns out to be very useful in practice, and this particular match
jsonArray contains '#(^
partialObject)'
form has no 'in-line' equivalent (see the third-from-last row above).
The last row in the table is a little different from the rest, and this short-cut form is the recommended way to validate the length of a JSON array. As a rule of thumb, prefer
match
overassert
, becausematch
failure messages are more detailed and descriptive.
In real-life tests, these are very useful when the order of items in arrays returned from the server are not guaranteed. You can easily assert that all expected elements are present, even in nested parts of your JSON - while doing a match
on the full payload.
* def cat =
"""
{
name: 'Billie',
kittens: [
{ id: 23, name: 'Bob' },
{ id: 42, name: 'Wild' }
]
}
"""
* def expected = [{ id: 42, name: 'Wild' }, { id: 23, name: 'Bob' }]
* match cat == { name: 'Billie', kittens: '#(^^expected)' }
There's a lot going on in the last line above ! It validates the entire payload in one step and checks if the kittens
array contains all the expected
items but in any order.
get
By now, it should be clear that JsonPath can be very useful for extracting JSON 'trees' out of a given object. The get
keyword allows you to save the results of a JsonPath expression for later use - which is especially useful for dynamic data-driven testing.
* def cat =
"""
{
name: 'Billie',
kittens: [
{ id: 23, name: 'Bob' },
{ id: 42, name: 'Wild' }
]
}
"""
* def kitnums = get cat.kittens[*].id
* match kitnums == [23, 42]
* def kitnames = get cat $.kittens[*].name
* match kitnames == ['Bob', 'Wild']
get
short-cutThe 'short cut' $variableName
form is also supported. Refer to JsonPath short-cuts for a detailed explanation. So the above could be re-written as follows:
* def kitnums = $cat.kittens[*].id
* match kitnums == [23, 42]
* def kitnames = $cat.kittens[*].name
* match kitnames == ['Bob', 'Wild']
It is worth repeating that the above can be condensed into 2 lines. Note that since only JsonPath is expected on the left-hand-side of the ==
sign of a match
statement, you don't need to prefix the variable reference with $
:
* match cat.kittens[*].id == [23, 42]
* match cat.kittens[*].name == ['Bob', 'Wild']
# if you prefer using 'pure' JsonPath, you can do this
* match cat $.kittens[*].id == [23, 42]
* match cat $.kittens[*].name == ['Bob', 'Wild']
get
plus indexA convenience that the get
syntax supports (but not the $
short-cut form) is to return a single element if the right-hand-side evaluates to a list-like result (e.g. a JSON array). This is useful because the moment you use a wildcard [*]
or search filter in JsonPath (see the next section), you get an array back - even though typically you would only be interested in the first item.
* def actual = 23
# so instead of this
* def kitnums = get cat.kittens[*].id
* match actual == kitnums[0]
# you can do this in one line
* match actual == get[0] cat.kittens[*].id
JsonPath filter expressions are very useful for extracting elements that meet some filter criteria out of arrays.
* def cat =
"""
{
name: 'Billie',
kittens: [
{ id: 23, name: 'Bob' },
{ id: 42, name: 'Wild' }
]
}
"""
# find single kitten where id == 23
* def bob = get[0] cat.kittens[?(@.id==23)]
* match bob.name == 'Bob'
# using the karate object if the expression is dynamic
* def temp = karate.jsonPath(cat, "$.kittens[?(@.name=='" + bob.name + "')]")
* match temp[0] == bob
# or alternatively
* def temp = karate.jsonPath(cat, "$.kittens[?(@.name=='" + bob.name + "')]")[0]
* match temp == bob
You usually won't need this, but the second-last line above shows how the karate
object can be used to evaluate JsonPath if the filter expression depends on a variable. If you find yourself struggling to write dynamic JsonPath filters, look at karate.filter()
as an alternative, described just below.
Karate supports the following functional-style operations via the JS API - karate.map()
, karate.filter()
and karate.forEach()
. They can be very useful in some situations. A good example is when you have the expected data available as ready-made JSON but it is in a different "shape" from the actual data or HTTP response
. There is also a karate.mapWithKey()
for a common need - which is to convert an array of primitives into an array of objects, which is the form that data driven features expect.
A few more useful "transforms" are to select a sub-set of key-value pairs using karate.filterKeys()
, merging 2 or more JSON-s using karate.merge()
and combining 2 or more arrays (or objects) into a single array using karate.append()
. And karate.appendTo()
is for updating an existing variable (the equivalent of array.push()
in JavaScript), which is especially useful in the body of a karate.forEach()
.
You can also sort arrays of arbitrary JSON using karate.sort()
. Simple arrays of strings or numbers can be stripped of duplicates using karate.distinct()
. All JS "native" array operations can be used, such as someName.reverse()
.
Note that a single JS function is sufficient to transform a given JSON object into a completely new one, and you can use complex conditional logic if needed.
Scenario: karate map operation
* def fun = function(x){ return x * x }
* def list = [1, 2, 3]
* def res = karate.map(list, fun)
* match res == [1, 4, 9]
Scenario: convert an array into a different shape
* def before = [{ foo: 1 }, { foo: 2 }, { foo: 3 }]
* def fun = function(x){ return { bar: x.foo } }
* def after = karate.map(before, fun)
* match after == [{ bar: 1 }, { bar: 2 }, { bar: 3 }]
Scenario: convert array of primitives into array of objects
* def list = [ 'Bob', 'Wild', 'Nyan' ]
* def data = karate.mapWithKey(list, 'name')
* match data == [{ name: 'Bob' }, { name: 'Wild' }, { name: 'Nyan' }]
Scenario: karate filter operation
* def fun = function(x){ return x % 2 == 0 }
* def list = [1, 2, 3, 4]
* def res = karate.filter(list, fun)
* match res == [2, 4]
Scenario: forEach works even on object key-values, not just arrays
* def keys = []
* def vals = []
* def idxs = []
* def fun =
"""
function(x, y, i) {
karate.appendTo(keys, x);
karate.appendTo(vals, y);
karate.appendTo(idxs, i);
}
"""
* def map = { a: 2, b: 4, c: 6 }
* karate.forEach(map, fun)
* match keys == ['a', 'b', 'c']
* match vals == [2, 4, 6]
* match idxs == [0, 1, 2]
Scenario: filterKeys
* def schema = { a: '#string', b: '#number', c: '#boolean' }
* def response = { a: 'x', c: true }
# very useful for validating a response against a schema "super-set"
* match response == karate.filterKeys(schema, response)
* match karate.filterKeys(response, 'b', 'c') == { c: true }
* match karate.filterKeys(response, ['a', 'b']) == { a: 'x' }
Scenario: merge
* def foo = { a: 1 }
* def bar = karate.merge(foo, { b: 2 })
* match bar == { a: 1, b: 2 }
Scenario: append
* def foo = [{ a: 1 }]
* def bar = karate.append(foo, { b: 2 })
* match bar == [{ a: 1 }, { b: 2 }]
Scenario: sort
* def foo = [{a: { b: 3 }}, {a: { b: 1 }}, {a: { b: 2 }}]
* def fun = function(x){ return x.a.b }
* def bar = karate.sort(foo, fun)
* match bar == [{a: { b: 1 }}, {a: { b: 2 }}, {a: { b: 3 }}]
* match bar.reverse() == [{a: { b: 3 }}, {a: { b: 2 }}, {a: { b: 1 }}]
Given the examples above, it has to be said that a best practice with Karate is to avoid JavaScript for
loops as far as possible. A common requirement is to build an array with n
elements or do something n
times where n
is an integer (that could even be a variable reference). This is easily achieved with the karate.repeat()
API:
* def fun = function(i){ return i * 2 }
* def foo = karate.repeat(5, fun)
* match foo == [0, 2, 4, 6, 8]
* def foo = []
* def fun = function(i){ karate.appendTo(foo, i) }
* karate.repeat(5, fun)
* match foo == [0, 1, 2, 3, 4]
# generate test data easily
* def fun = function(i){ return { name: 'User ' + (i + 1) } }
* def foo = karate.repeat(3, fun)
* match foo == [{ name: 'User 1' }, { name: 'User 2' }, { name: 'User 3' }]
# generate a range of numbers as a json array
* def foo = karate.range(4, 9)
* match foo == [4, 5, 6, 7, 8, 9]
And there's also karate.range()
which can be useful to generate test-data.
Don't forget that Karate's data-driven testing capabilities can loop over arrays of JSON objects automatically.
When handling XML, you sometimes need to call XPath functions, for example to get the count of a node-set. Any valid XPath expression is allowed on the left-hand-side of a match
statement.
* def myXml =
"""
<records>
<record index="1">a</record>
<record index="2">b</record>
<record index="3" foo="bar">c</record>
</records>
"""
* match foo count(/records//record) == 3
* match foo //record[@index=2] == 'b'
* match foo //record[@foo='bar'] == 'c'
Some XPath expressions return a list of nodes (instead of a single node). But since you can express a list of data-elements as a JSON array - even these XPath expressions can be used in match
statements.
* def teachers =
"""
<teachers>
<teacher department="science">
<subject>math</subject>
<subject>physics</subject>
</teacher>
<teacher department="arts">
<subject>political education</subject>
<subject>english</subject>
</teacher>
</teachers>
"""
* match teachers //teacher[@department='science']/subject == ['math', 'physics']
If your XPath is dynamic and has to be formed 'on the fly' perhaps by using some variable derived from previous steps, you can use the karate.xmlPath()
helper:
* def xml = <query><name><foo>bar</foo></name></query>
* def elementName = 'name'
* def name = karate.xmlPath(xml, '/query/' + elementName + '/foo')
* match name == 'bar'
* def queryName = karate.xmlPath(xml, '/query/' + elementName)
* match queryName == <name><foo>bar</foo></name>
You can refer to this file (which is part of the Karate test-suite) for more XML examples: xml-and-xpath.feature
Special Variables
These are 'built-in' variables, there are only a few and all of them give you access to the HTTP response.
response
After every HTTP call this variable is set with the response body, and is available until the next HTTP request over-writes it. You can easily assign the whole response
(or just parts of it using Json-Path or XPath) to a variable, and use it in later steps.
The response is automatically available as a JSON, XML or String object depending on what the response contents are.
As a short-cut, when running JsonPath expressions - $
represents the response
. This has the advantage that you can use pure JsonPath and be more concise. For example:
# the three lines below are equivalent
Then match response $ == { name: 'Billie' }
Then match response == { name: 'Billie' }
Then match $ == { name: 'Billie' }
# the three lines below are equivalent
Then match response.name == 'Billie'
Then match response $.name == 'Billie'
Then match $.name == 'Billie'
And similarly for XML and XPath, '/' represents the response
# the four lines below are equivalent
Then match response / == <cat><name>Billie</name></cat>
Then match response/ == <cat><name>Billie</name></cat>
Then match response == <cat><name>Billie</name></cat>
Then match / == <cat><name>Billie</name></cat>
# the three lines below are equivalent
Then match response /cat/name == 'Billie'
Then match response/cat/name == 'Billie'
Then match /cat/name == 'Billie'
The $varName
form is used on the right-hand-side of Karate expressions and is slightly different from pure JsonPath expressions which always begin with $.
or $[
. Here is a summary of what the different 'shapes' mean in Karate:
Shape | Description |
---|---|
$.bar | Pure JsonPath equivalent of $response.bar where response is a JSON object |
$[0] | Pure JsonPath equivalent of $response[0] where response is a JSON array |
$foo.bar | Evaluates the JsonPath $.bar on the variable foo which is a JSON object or map-like |
$foo[0] | Evaluates the JsonPath $[0] on the variable foo which is a JSON array or list-like |
There is no need to prefix variable names with
$
on the left-hand-side ofmatch
statements because it is implied. You can if you want to, but since only JsonPath (on variables) is allowed here, Karate ignores the$
and looks only at the variable name. None of the examples in the documentation use the$varName
form on the LHS, and this is the recommended best-practice.
responseBytes
This will always hold the contents of the response as a byte-array. This is rarely used, unless you are expecting binary content returned by the server. The match
keyword will work as you expect. Here is an example: binary.feature
.
responseCookies
The responseCookies
variable is set upon any HTTP response and is a map-like (or JSON-like) object. It can be easily inspected or used in expressions.
* assert responseCookies['my.key'].value == 'someValue'
# karate's unified data handling means that even 'match' works
* match responseCookies contains { time: '#notnull' }
# ... which means that checking if a cookie does NOT exist is a piece of cake
* match responseCookies !contains { blah: '#notnull' }
# save a response cookie for later use
* def time = responseCookies.time.value
As a convenience, cookies from the previous response are collected and passed as-is as part of the next HTTP request. This is what is normally expected and simulates a web-browser - which makes it easy to script things like HTML-form based authentication into test-flows. Refer to the documentation for cookie
for details and how you can disable this if need be.
Each item within responseCookies
is itself a 'map-like' object. Typically you would examine the value
property as in the example above, but domain
and path
are also available.
responseHeaders
See also match header
which is what you would normally need.
But if you need to use values in the response headers - they will be in a variable named responseHeaders
. Note that it is a 'map of lists' so you will need to do things like this:
* def contentType = responseHeaders['Content-Type'][0]
And just as in the responseCookies
example above, you can use match
to run complex validations on the responseHeaders
.
Finally, using karate.responseheader()
can be simpler to just get a header value string by name, and it will ignore-case for the name passed as the argument:
* match karate.header('content-type') == 'application/json'
responseStatus
You would normally only need to use the status
keyword. But if you really need to use the HTTP response code in an expression or save it for later, you can get it as an integer:
* def uploadStatusCode = responseStatus
# check if the response status is either of two values
Then assert responseStatus == 200 || responseStatus == 204
Note that match
can give you some extra readable options:
* match [200, 201, 204] contains responseStatus
# this may be sufficient to check a range of values
* assert responseStatus >= 200
* assert responseStatus < 300
# but using karate.range() you can even do this !
* match karate.range(200, 299) contains responseStatus
responseTime
The response time (in milliseconds) for the current response
would be available in a variable called responseTime
. You can use this to assert that it was returned within the expected time like so:
When method post
Then status 201
And assert responseTime < 1000
responseType
Karate will attempt to parse the raw HTTP response body as JSON or XML and make it available as the response
value. If parsing fails, Karate will log a warning and the value of response
will then be a plain string. You can still perform string comparisons such as a match contains
and look for error messages etc. In rare cases, you may want to check what the "type" of the response
is and it can be one of 3 different values: json
, xml
and string
.
So if you really wanted to assert that the HTTP response body is well-formed JSON or XML you can do this:
When method post
Then status 201
And match responseType == 'json'
requestTimeStamp
Very rarely used - but you can get the Java system-time (for the current response
) at the point when the HTTP request was initiated (the value of System.currentTimeMillis()
) which can be used for detailed logging or custom framework / stats calculations.
HTTP Header Manipulation
configure headers
Custom header manipulation for every HTTP request is something that Karate makes very easy and pluggable. For every HTTP request made from Karate, the internal flow is as follows:
configure
the value of headers
?call
is made to that function.This makes setting up of complex authentication schemes for your test-flows really easy. It typically ends up being a one-liner that appears in the Background
section at the start of your test-scripts. You can re-use the function you create across your whole project.
Here is an example JavaScript function that uses some variables in the context (which have been possibly set as the result of a sign-in) to build the Authorization
header. Note how even calls to Java code can be made if needed.
In the example below, note the use of the
karate.get()
helper for getting the value of a dynamic variable (which was not set at the time this JSfunction
was declared). This is preferred because it takes care of situations such as if the value isundefined
in JavaScript. In rare cases you may need to set a variable from this routine, and a good example is to make the generated UUID "visible" to the currently executing script or feature. You can easily do this viakarate.set('someVarName', value)
.
function fn() {
var uuid = '' + java.util.UUID.randomUUID(); // convert to string
var out = { // so now the txid_header would be a unique uuid for each request
txid_header: uuid,
ip_header: '123.45.67.89', // hard coded here, but also can be as dynamic as you want
};
var authString = '';
var authToken = karate.get('authToken'); // use the 'karate' helper to do a 'safe' get of a 'dynamic' variable
if (authToken) { // and if 'authToken' is not null ...
authString = ',auth_type=MyAuthScheme'
+ ',auth_key=' + authToken.key
+ ',auth_user=' + authToken.userId
+ ',auth_project=' + authToken.projectId;
}
// the 'appId' variable here is expected to have been set via karate-config.js (bootstrap init) and will never change
out['Authorization'] = 'My_Auth app_id=' + appId + authString;
return out;
}
Assuming the above code is in a file called my-headers.js
, the next section on calling other feature files shows how it looks like in action at the beginning of a test script.
Notice how once the authToken
variable is initialized, it is used by the above function to generate headers for every HTTP call made as part of the test flow.
If a few steps in your flow need to temporarily change (or completely bypass) the currently-set header-manipulation scheme, just update configure headers
to a new value (or set it to null
) in the middle of a script. Then use the header
keyword to do a custom 'over-ride' if needed.
The karate-demo has an example showing various ways to configure
or set headers: headers.feature
The karate
object
A JavaScript function or Karate expression at runtime has access to a utility object in a variable named: karate
. This provides the following methods:
Operation | Description |
---|---|
karate.abort() | you can prematurely exit a Scenario by combining this with conditional logic like so: * if (condition) karate.abort() - please use sparingly ! and also see configure abortedStepsShouldPass |
karate.append(... items) | useful to create lists out of items (which can be lists as well), see JSON transforms |
karate.appendTo(name, ... items) | useful to append to a list-like variable (that has to exist) in scope, see JSON transforms - the first argument can be a reference to an array-like variable or even the name (string) of an existing variable which is list-like |
karate.call(fileName, [arg]) | invoke a *.feature file or a JavaScript function the same way that call works (with an optional solitary argument), see call() vs read() for details |
karate.callSingle(fileName, [arg]) | like the above, but guaranteed to run only once even across multiple features - see karate.callSingle() |
karate.configure(key, value) | does the same thing as the configure keyword, and a very useful example is to do karate.configure('connectTimeout', 5000); in karate-config.js - which has the 'global' effect of not wasting time if a connection cannot be established within 5 seconds |
karate.distinct(list) | returns only unique items out of an array of strings or numbers |
karate.doc(arg) | just like karate.render() but will insert the HTML into the report |
karate.embed(object, mimeType) | embeds the object (can be raw bytes or an image) into the JSON report output, see this example |
karate.env | gets the value (read-only) of the environment property 'karate.env', and this is typically used for bootstrapping configuration |
karate.eval(expression) | for really advanced needs, you can programmatically generate a snippet of JavaScript which can be evaluated at run-time, you can find an example here |
karate.exec(command) | convenient way to execute an OS specific command and return the console output e.g. karate.exec('some.exe -h') (or karate.exec(['some.exe', '-h']) ) useful for calling non-Java code (that can even return data) or for starting user-interfaces to be automated, this command will block until the process terminates, also see karate.fork() |
karate.extract(text, regex, group) | useful to "scrape" text out of non-JSON or non-XML text sources such as HTML, group follows the Java regex rules, see this example |
karate.extractAll(text, regex, group) | like the above, but returns a list of text-matches |
karate.fail(message) | if you want to conditionally stop a test with a descriptive error message, e.g. * if (condition) karate.fail('we expected something else') |
karate.feature | get metadata about the currently executing feature within a test |
karate.filter(list, predicate) | functional-style 'filter' operation useful to filter list-like objects (e.g. JSON arrays), see example, the second argument has to be a JS function (item, [index]) that returns a boolean |
karate.filterKeys(map, keys) | extracts a sub-set of key-value pairs from the first argument, the second argument can be a list (or varargs) of keys - or even another JSON where only the keys would be used for extraction, example |
karate.forEach(list, function) | functional-style 'loop' operation useful to traverse list-like (or even map-like) objects (e.g. JSON / arrays), see example, the second argument has to be a JS function (item, [index]) for lists and (key, [value], [index]) for JSON / maps |
karate.fork(map) | executes an OS command, but forks a process in parallel and will not block the test like karate.exec() e.g. karate.fork({ args: ['some.exe', '-h'] }) or karate.fork(['some.exe', '-h']) - you can use a composite string as line (or the solitary argument e.g. karate.fork('some.exe -h') ) instead of args , and an optional workingDir string property and env JSON / map is also supported - this returns a Command object which has operations such as waitSync() and close() if you need more control, more details here |
karate.fromString(string) | for advanced conditional logic for e.g. when a string coming from an external process is dynamic - and whether it is JSON or XML is not known in advance, see example |
karate.get(name, [default]) | get the value of a variable by name (or JsonPath expression), if not found - this returns null which is easier to handle in JavaScript (than undefined ), and an optional (literal / constant) second argument can be used to return a "default" value, very useful to set variables in called features that have not been pre-defined |
karate.http(url) | returns a convenience Http request builder class, only recommended for advanced use |
karate.jsonPath(json, expression) | brings the power of JsonPath into JavaScript, and you can find an example here. |
karate.keysOf(object) | returns only the keys of a map-like object |
karate.log(... args) | log to the same logger (and log file) being used by the parent process, logging can be suppressed with configure printEnabled set to false , and just like print - use comma-separated values to "pretty print" JSON or XML |
karate.logger.debug(... args) | access to the Karate logger directly and log in debug. Might be desirable instead of karate.log or print when looking to reduce the logs in console in your CI/CD pipeline but still retain the information for reports. See Logging for additional details. |
karate.lowerCase(object) | useful to brute-force all keys and values in a JSON or XML payload to lower-case, useful in some cases, see example |
karate.map(list, function) | functional-style 'map' operation useful to transform list-like objects (e.g. JSON arrays), see example, the second argument has to be a JS function (item, [index]) |
karate.mapWithKey(list, string) | convenient for the common case of transforming an array of primitives into an array of objects, see JSON transforms |
karate.match(actual, expected) | brings the power of the fuzzy match syntax into Karate-JS, returns a JSON in the form { pass: '#boolean', message: '#string' } and you can find an example here - you can even place a full match expression like this: karate.match("each foo contains { a: '#number' }") |
karate.merge(... maps) | useful to merge the key-values of two (or more) JSON (or map-like) objects, see JSON transforms |
karate.os | returns the operating system details as JSON, for e.g. { type: 'macosx', name: 'Mac OS X' } - useful for writing conditional logic, the possible type -s being: macosx , windows , linux and unknown |
karate.pause(number) | sleep time in milliseconds, relevant only for performance-testing - and will be a no-op otherwise unless configure pauseIfNotPerf is true |
karate.pretty(value) | return a 'pretty-printed', nicely indented string representation of the JSON value, also see: print |
karate.prettyXml(value) | return a 'pretty-printed', nicely indented string representation of the XML value, also see: print |
karate.prevRequest | for advanced users, you can inspect the actual HTTP request after it happens, useful if you are writing a framework over Karate, refer to this example: request.feature |
karate.properties[key] | get the value of any Java system-property by name, useful for advanced custom configuration |
karate.range(start, end, [interval]) | returns a JSON array of integers (inclusive), the optional third argument must be a positive integer and defaults to 1, and if start < end the order of values is reversed |
karate.read(filename) | the same read() function - which is pre-defined even within JS blocks, so there is no need to ever do karate.read() , and just read() is sufficient |
karate.readAsBytes(filename) | rarely used, like karate.readAsString - but returns a byte array |
karate.readAsStream(filename) | rarely used, like karate.readAsString - but returns a Java InputStream |
karate.readAsString(filename) | rarely used, behaves exactly like read - but does not auto convert to JSON or XML |
karate.remove(name, path) | very rarely used - when needing to perform conditional removal of JSON keys or XML nodes. Behaves the same way as the remove keyword. |
karate.render(arg) | renders an HTML template, the arg can be a string (prefixable path to the HTML) or a JSON that takes either a path or html property, see doc |
karate.repeat(count, function) | useful for building an array with count items or doing something count times, refer this example. Also see loops. |
karate.responseHeader(string) | returns the response HTTP header value (as a single string) for the given name, and will ignore-case, and can be simpler than using responseHeaders |
karate.scenario | get metadata about the currently executing Scenario (or Outline - Example ) within a test |
karate.set(name, value) | sets the value of a variable (immediately), which may be needed in case any other routines (such as the configured headers) depend on that variable |
karate.set(object) | where the single argument is expected to be a Map or JSON-like, and will perform the above karate.set() operation for all key-value pairs in one-shot, see example |
karate.set(name, path, value) | only needed when you need to conditionally build payload elements, especially XML. This is best explained via an example, and it behaves the same way as the set keyword. Also see eval . |
karate.setXml(name, xmlString) | rarely used, refer to the example above |
karate.signal(result) | trigger an event that karate.listen(timeout) is waiting for, and pass the data, see async |
karate.sizeOf(object) | returns the size of the map-like or list-like object |
karate.sort(list, function) | sorts the list using the provided custom function called for each item in the list (and the optional second argument is the item index) e.g. karate.sort(myList, x => x.val) , and the second / function argument is not needed if the list is of plain strings or numbers |
karate.start() | only for starting a mock from within a test / feature file see mocks |
karate.stop(port) | will pause the test execution until a socket connection (even HTTP GET ) is made to the port logged to the console, useful for troubleshooting UI tests without using a de-bugger, of course - NEVER forget to remove this after use ! |
karate.target(object) | currently for web-ui automation only, see target lifecycle |
karate.tags | for advanced users - scripts can introspect the tags that apply to the current scope, refer to this example: tags.feature |
karate.tagValues | for even more advanced users - Karate natively supports tags in a @name=val1,val2 format, and there is an inheritance mechanism where Scenario level tags can over-ride Feature level tags, refer to this example: tags.feature |
karate.toAbsolutePath(relativePath) | when you want to get the absolute OS path to the argument which could even have a prefix such as classpath: , e.g. karate.toAbsolutePath('some.json') |
karate.toBean(json, className) | converts a JSON string or map-like object into a Java object, given the Java class name as the second argument, refer to this file for an example |
karate.toCsv(list) | converts a JSON array (of objects) or a list-like object into a CSV string, writing this to a file is your responsibility or you could use karate.write() |
karate.toJava(function) | rarely used, when you need to pass a JS function to custom Java code, typically for Async, and another edge case is to convert a JSON array or object to a Java List or Map , see example |
karate.toJson(object) | converts a Java object into JSON, and karate.toJson(object, true) will strip all keys that have null values from the resulting JSON, convenient for unit-testing Java code, see example |
karate.typeOf(any) | for advanced conditional logic when object types are dynamic and not known in advance, see example |
karate.urlDecode(string) | URL decode |
karate.urlEncode(string) | URL encode |
karate.valuesOf(object) | returns only the values of a map-like object (or itself if a list-like object) |
karate.waitForHttp(url) | will wait until the URL is ready to accept HTTP connections |
karate.waitForPort(host, port) | will wait until the host:port is ready to accept socket connections |
karate.webSocket(url, handler) | see websocket |
karate.write(object, path) | normally not recommended, please read this first - writes the bytes of object to a path which will always be relative to the "build" directory (typically target ), see this example: embed-pdf.js - and this method returns a java.io.File reference to the file created / written to |
karate.xmlPath(xml, expression) | Just like karate.jsonPath() - but for XML, and allows you to use dynamic XPath if needed, see example. |
Code Reuse / Common Routines
call
In any complex testing endeavor, you would find yourself needing 'common' code that needs to be re-used across multiple test scripts. A typical need would be to perform a 'sign in', or create a fresh user as a pre-requisite for the scenarios being tested.
There are two types of code that can be call
-ed. *.feature
files and JavaScript functions.
*.feature
filesWhen you have a sequence of HTTP calls that need to be repeated for multiple test scripts, Karate allows you to treat a *.feature
file as a re-usable unit. You can also pass parameters into the *.feature
file being called, and extract variables out of the invocation result.
Here is an example of using the call
keyword to invoke another feature file, loaded using the read
function:
If you find this hard to understand at first, try looking at this set of examples.
Feature: which makes a 'call' to another re-usable feature
Background:
* configure headers = read('classpath:my-headers.js')
* def signIn = call read('classpath:my-signin.feature') { username: 'john', password: 'secret' }
* def authToken = signIn.authToken
Scenario: some scenario
# main test steps
Note that
def
can be used to assign a feature to a variable. For example look at how "creator
" has been defined in theBackground
in this example, and used later in acall
statement. This is very close to how "custom keywords" work in other frameworks. See this other example for more ideas:dsl.feature
.
The contents of my-signin.feature
are shown below. A few points to note:
configure
settings would be available to use, for example loginUrlBase
in the example below.def
in the 'called' feature, it will not over-write variables in the 'calling' feature (unless you explicitly choose to use shared scope). But note that JSON, XML, Map-like or List-like variables are 'passed by reference' which means that 'called' feature steps can update or 'mutate' them using the set
keyword. Use the copy
keyword to 'clone' a JSON or XML payload if needed, and refer to this example for more details: copy.feature
.__arg
.def
) in the 'called' script would be returned as 'keys' within a JSON-like object. Note that this includes 'built-in' variables, which means that things like the last value of response
would also be present. In the example above you can see that the JSON 'envelope' returned - is assigned to the variable named signIn
. And then getting hold of any data that was generated by the 'called' script is as simple as accessing it by name, for example signIn.authToken
as shown above. This design has the following advantages:Note that only variables and configuration settings will be passed. You can't do things such as
* url 'http://foo.bar'
and expect the URL to be set in the "called" feature. Use a variable in the "called" feature instead, for e.g.* url myUrl
.
Feature: here are the contents of 'my-signin.feature'
Scenario:
Given url loginUrlBase
And request { userId: '#(username)', userPass: '#(password)' }
When method post
Then status 200
And def authToken = response
# second HTTP call, to get a list of 'projects'
Given path 'users', authToken.userId, 'projects'
When method get
Then status 200
# logic to 'choose' first project
And set authToken.projectId = response.projects[0].projectId;
The above example actually makes two HTTP requests - the first is a standard 'sign-in' POST and then (for illustrative purposes) another HTTP call (a GET) is made for retrieving a list of projects for the signed-in user, and the first one is 'selected' and added to the returned 'auth token' JSON object.
So you get the picture, any kind of complicated 'sign-in' flow can be scripted and re-used.
If the second HTTP call above expects headers to be set by
my-headers.js
- which in turn depends on theauthToken
variable being updated, you will need to duplicate the line* configure headers = read('classpath:my-headers.js')
from the 'caller' feature here as well. The above example does not use shared scope, which means that the variables in the 'calling' (parent) feature are not shared by the 'called'my-signin.feature
. The above example can be made more simpler with the use ofcall
(orcallonce
) without adef
-assignment to a variable, and is the recommended pattern for implementing re-usable authentication setup flows.
Do look at the documentation and example for configure headers
also as it goes hand-in-hand with call
. In the above example, the end-result of the call
to my-signin.feature
resulted in the authToken
variable being initialized. Take a look at how the configure headers
example uses the authToken
variable.
You can "select" a single Scenario
(or Scenario
-s or Scenario Outline
-s or even specific Examples
rows) by appending a "tag selector" at the end of the feature-file you are calling. For example:
call read('classpath:my-signin.feature@name=someScenarioName')
While the tag does not need to be in the @key=value
form, it is recommended for readability when you start getting into the business of giving meaningful names to your Scenario
-s.
This "tag selection" capability is designed for you to be able to "compose" flows out of existing test-suites when using the Karate Gatling integration. Normally we recommend that you keep your "re-usable" features lightweight - by limiting them to just one Scenario
.
As a convenience, you can call a tag directly, which is a short-cut to call another Scenario
within the same feature file. Note that you would typically want to use the @ignore
tag for such cases.
Scenario: one
* call read('@two')
@ignore @two
Scenario: two
* print 'called'
If the argument passed to the call of a *.feature
file is a JSON array, something interesting happens. The feature is invoked for each item in the array. Each array element is expected to be a JSON object, and for each object - the behavior will be as described above.
But this time, the return value from the call
step will be a JSON array of the same size as the input array. And each element of the returned array will be the 'envelope' of variables that resulted from each iteration where the *.feature
got invoked.
Here is an example that combines the table
keyword with calling a *.feature
. Observe how the get
shortcut is used to 'distill' the result array of variable 'envelopes' into an array consisting only of response
payloads.
* table kittens
| name | age |
| 'Bob' | 2 |
| 'Wild' | 1 |
| 'Nyan' | 3 |
* def result = call read('cat-create.feature') kittens
* def created = $result[*].response
* match each created == { id: '#number', name: '#string', age: '#number' }
* match created[*].name contains only ['Bob', 'Wild', 'Nyan']
And here is how cat-create.feature
could look like:
@ignore
Feature:
Scenario:
Given url someUrlFromConfig
And path 'cats'
And request { name: '#(name)', age: '#(age)' }
When method post
Then status 200
If you replace the table
with perhaps a JavaScript function call that gets some JSON data from some data-source, you can imagine how you could go about dynamic data-driven testing.
Although it is just a few lines of code, take time to study the above example carefully. It is a great example of how to effectively use the unique combination of Cucumber and JsonPath that Karate provides.
Also look at the demo examples, especially dynamic-params.feature
- to compare the above approach with how the Cucumber Scenario Outline:
can be alternatively used for data-driven tests.
call
Although all properties in the passed JSON-like argument are 'unpacked' into the current scope as separate 'named' variables, it sometimes makes sense to access the whole argument and this can be done via __arg
. And if being called in a loop, a built-in variable called __loop
will also be available that will hold the value of the current loop index. So you can do things like this: * def name = name + __loop
- or you can use the loop index value for looking up other values that may be in scope - in a data-driven style.
Variable | Refers To |
---|---|
__arg | the single call (or callonce ) argument, will be null if there was none |
__loop | the current iteration index (starts from 0) if being called in a loop, will be -1 if not |
Refer to this demo feature for an example: kitten-create.feature
Some users need "callable" features that are re-usable even when variables have not been defined by the calling feature. Normally an undefined variable results in nasty JavaScript errors. But there is an elegant way you can specify a default value using the karate.get()
API:
# if foo is not defined, it will default to 42
* def foo = karate.get('foo', 42)
A word of caution: we recommend that you should not over-use Karate's capability of being able to re-use features. Re-use can sometimes result in negative benefits - especially when applied to test-automation. Prefer readability over re-use. See this for an example.
copy
For a call
(or callonce
) - payload / data structures (JSON, XML, Map-like or List-like) variables are 'passed by reference' which means that steps within the 'called' feature can update or 'mutate' them, for e.g. using the set
keyword. This is actually the intent most of the time and is convenient. If you want to pass a 'clone' to a 'called' feature, you can do so using the rarely used copy
keyword that works very similar to type conversion. This is best explained in this example: copy.feature
.
Examples of defining and using JavaScript functions appear in earlier sections of this document. Being able to define and re-use JavaScript functions is a powerful capability of Karate. For example, you can:
For an advanced example of how you can build and re-use a common set of JS functions, refer to this answer on Stack Overflow.
In real-life scripts, you would typically also use this capability of Karate to configure headers
where the specified JavaScript function uses the variables that result from a sign in to manipulate headers for all subsequent HTTP requests. And it is worth mentioning that the Karate configuration 'bootstrap' routine is itself a JavaScript function.
Also refer to the
eval
keyword for a simpler way to execute arbitrary JavaScript that can be useful in some situations.
call
When using call
(or callonce
), only one argument is allowed. But this does not limit you in any way, because similar to how you can call *.feature files
, you can pass a whole JSON object as the argument. In the case of the call
of a JavaScript function, you can also pass a JSON array or a primitive (string, number, boolean) as the solitary argument, and the function implementation is expected to handle whatever is passed.
Instead of using call
(or callonce
) you are always free to call JavaScript functions 'normally' and then you can use more than one argument.
* def adder = function(a, b){ return a + b }
* assert adder(1, 2) == 3
Naturally, only one value can be returned. But again, you can return a JSON object. There are two things that can happen to the returned value.
Either - it can be assigned to a variable like so.
* def returnValue = call myFunction
Or - if a call
is made without an assignment, and if the function returns a map-like object, it will add each key-value pair returned as a new variable into the execution context.
# while this looks innocent ...
# ... behind the scenes, it could be creating (or over-writing) a bunch of variables !
* call someFunction
While this sounds dangerous and should be used with care (and limits readability), the reason this feature exists is to quickly set (or over-write) a bunch of config variables when needed. In fact, this is the mechanism used when karate-config.js
is processed on start-up.
This behavior where all key-value pairs in the returned map-like object get automatically added as variables - applies to the calling of *.feature
files as well. In other words, when call
or callonce
is used without a def
, the 'called' script not only shares all variables (and configure
settings) but can update the shared execution context. This is very useful to boil-down those 'common' steps that you may have to perform at the start of multiple test-scripts - into one-liners. But use wisely, because called scripts will now over-write variables that may have been already defined.
* def config = { user: 'john', password: 'secret' }
# this next line may perform many steps and result in multiple variables set for the rest of the script
* call read('classpath:common-setup.feature') config
You can use callonce
instead of call
within the Background
in case you have multiple Scenario
sections or Examples
. Note the 'inline' use of the read function as a short-cut above. This applies to JS functions as well:
* call read('my-function.js')
These heavily commented demo examples can help you understand 'shared scope' better, and are designed to get you started with creating re-usable 'sign-in' or authentication flows:
Scope | Caller Feature | Called Feature |
---|---|---|
Isolated | call-isolated-headers.feature | common-multiple.feature |
Shared | call-updates-config.feature | common.feature |
Once you get comfortable with Karate, you can consider moving your authentication flow into a 'global' one-time flow using
karate.callSingle()
, think of it as 'callonce
on steroids'.
call
vs read()
Since this is a frequently asked question, the different ways of being able to re-use code (or data) are summarized below.
Code | Description |
---|---|
* def login = read('login.feature') * call login | Shared Scope, and the login variable can be re-used |
* call read('login.feature') | short-cut for the above without needing a variable |
* def credentials = read('credentials.json') * def login = read('login.feature') * call login credentials | Note how using read() for a JSON file returns data - not "callable" code, and here it is used as the call argument |
* call read('login.feature') read('credentials.json') | You can do this in theory, but it is not as readable as the above |
* karate.call('login.feature') | The JS API allows you to do this, but this will not be Shared Scope |
* def result = call read('login.feature') | call result assigned to a variable and not Shared Scope |
* def result = karate.call('login.feature') | exactly equivalent to the above ! |
* if (cond) karate.call(true, 'login.feature') | if you need conditional logic and Shared Scope, add a boolean true first argument |
* def credentials = read('credentials.json') * def result = call read('login.feature') credentials | like the above, but with a call argument |
* def credentials = read('credentials.json') * def result = karate.call('login.feature', credentials) | like the above, but in JS API form, the advantage of the above form is that using an in-line argument is less "cluttered" (see next row) |
* def login = read('login.feature') * def result = call login { user: 'john', password: 'secret' } | using the call keyword makes passing an in-line JSON argument more "readable" |
* call read 'credentials.json' | Since "read " happens to be a function (that takes a single string argument), this has the effect of loading all keys in the JSON file into Shared Scope as variables ! This can be sometimes handy. |
* call read ('credentials.json') | A common mistake. First, there is no meaning in call for JSON. Second, the space after the " read " makes this equal to the above. |
* karate.set(read('credentials.json')) | For completeness - this has exactly the same effect as the above two rows ! |
There are examples of calling JVM classes in the section on Java Interop and in the file-upload demo. Also look at the section on commonly needed utilities for more ideas.
Calling any Java code is that easy. Given this custom, user-defined Java class:
package com.mycompany;
import java.util.HashMap;
import java.util.Map;
public class JavaDemo {
public Map<String, Object> doWork(String fromJs) {
Map<String, Object> map = new HashMap<>();
map.put("someKey", "hello " + fromJs);
return map;
}
public static String doWorkStatic(String fromJs) {
return "hello " + fromJs;
}
}
This is how it can be called from a test-script via JavaScript, and yes, even static methods can be invoked:
* def doWork =
"""
function(arg) {
var JavaDemo = Java.type('com.mycompany.JavaDemo');
var jd = new JavaDemo();
return jd.doWork(arg);
}
"""
# in this case the solitary 'call' argument is of type string
* def result = call doWork 'world'
* match result == { someKey: 'hello world' }
# using a static method - observe how java interop is truly seamless !
* def JavaDemo = Java.type('com.mycompany.JavaDemo')
* def result = JavaDemo.doWorkStatic('world')
* assert result == 'hello world'
Note that JSON gets auto-converted to Map
(or List
) when making the cross-over to Java. Refer to the cats-java.feature
demo for an example.
An additional-level of auto-conversion happens when objects cross the boundary between JS and Java. In the rare case that you need to mutate a
Map
orList
returned from Java but while still within a JS block, usekarate.toJson()
to convert.
Another example is dogs.feature
- which actually makes JDBC (database) calls, and since the data returned from the Java code is JSON, the last section of the test is able to use match
very effectively for data assertions.
A great example of how you can extend Karate, even bypass the HTTP client but still use Karate's test-automation effectively, is this gRPC example by @thinkerou: karate-grpc
. And you can even handle asynchronous flows such as listening to message-queues.
This should make it clear why Karate does not provide 'out of the box' support for any particular HTTP authentication scheme. Things are designed so that you can plug-in what you need, without needing to compile Java code. You get to choose how to manage your environment-specific configuration values such as user-names and passwords.
First the JavaScript file, basic-auth.js
:
function fn(creds) {
var temp = creds.username + ':' + creds.password;
var Base64 = Java.type('java.util.Base64');
var encoded = Base64.getEncoder().encodeToString(temp.toString().getBytes());
return 'Basic ' + encoded;
}
And here's how it works in a test-script using the header
keyword.
* header Authorization = call read('basic-auth.js') { username: 'john', password: 'secret' }
You can set this up for all subsequent requests or dynamically generate headers for each HTTP request if you configure headers
.
callonce
Cucumber has a limitation where Background
steps are re-run for every Scenario
. And if you have a Scenario Outline
, this happens for every row in the Examples
. This is a problem especially for expensive, time-consuming HTTP calls, and this has been an open issue for a long time.
Karate's callonce
keyword behaves exactly like call
but is guaranteed to execute only once. The results of the first call are cached, and any future calls will simply return the cached result instead of executing the JavaScript function (or feature) again and again.
This does require you to move 'set-up' into a separate *.feature
(or JavaScript) file. But this totally makes sense for things not part of the 'main' test flow and which typically need to be re-usable anyway.
So when you use the combination of callonce
in a Background
, you can indeed get the same effect as using a @BeforeClass
annotation, and you can find examples in the karate-demo, such as this one: callonce.feature
.
A callonce
is ideally used for only "pure" JSON. You may face issues if you attempt to mix in JS functions or Java code. See karate.callSingle()
.
eval
This is for evaluating arbitrary JavaScript and you are advised to use this only as a last resort ! Conditional logic is not recommended especially within test scripts because tests should be deterministic.
There are a few situations where this comes in handy:
if
form (also see conditional logic)set
and remove
- by using karate.set()
and karate.remove()
.# just perform an action, we don't care about saving the result
* eval myJavaScriptFunction()
# do something only if a condition is true
* eval if (zone == 'zone1') karate.set('temp', 'after')
As a convenience, you can omit the eval
keyword and so you can shorten the above to:
* myJavaScriptFunction()
* if (zone == 'zone1') karate.set('temp', 'after')
This is very convenient especially if you are calling a method on a variable that has been defined such as the karate
object, and for general-purpose scripting needs such as UI automation. Note how karate.set()
and karate.remove()
below are used directly as a script "statement".
# you can use multiple lines of JavaScript if needed
* eval
"""
var foo = function(v){ return v * v };
var nums = [0, 1, 2, 3, 4];
var squares = [];
for (var n in nums) {
squares.push(foo(n));
}
karate.set('temp', squares);
"""
* match temp == [0, 1, 4, 9, 16]
* def json = { a: 1 }
* def key = 'b'
# use dynamic path expressions to mutate json
* json[key] = 2
* match json == { a: 1, b: 2 }
* karate.remove('json', key)
* match json == { a: 1 }
* karate.set('json', '$.c[]', { d: 'e' })
* match json == { a: 1, c: [{ d: 'e' }] }
Advanced / Tricks
The built-in retry until
syntax should suffice for most needs, but if you have some specific needs, this demo example (using JavaScript) should get you up and running: polling.feature
.
The keywords Given
When
Then
are only for decoration and should not be thought of as similar to an if - then - else
statement. And as a testing framework, Karate discourages tests that give different results on every run.
That said, if you really need to implement 'conditional' checks, this can be one pattern:
* def filename = zone == 'zone1' ? 'test1.feature' : 'test2.feature'
* def result = call read(filename)
And this is another, using karate.call()
. Here we want to call
a file only if a condition is satisfied:
* def result = responseStatus == 404 ? {} : karate.call('delete-user.feature')
Or if we don't care about the result, we can eval
an if
statement:
* if (responseStatus == 200) karate.call('delete-user.feature')
And this may give you more ideas. You can always use a JavaScript function or call Java for more complex logic.
* def expected = zone == 'zone1' ? { foo: '#string' } : { bar: '#number' }
* match response == expected
You can always use a JavaScript switch case
within an eval
or function block. But one pattern that you should be aware of is that JSON is actually a great data-structure for looking up data.
* def data =
"""
{
foo: 'hello',
bar: 'world'
}
"""
# in real-life key can be dynamic
* def key = 'bar'
# and used to lookup data
* match (data[key]) == 'world'
You can find more details here. Also note how you can wrap the LHS of the match
in parentheses in the rare cases where the parser expects JsonPath by default.
In some rare cases you need to exit a Scenario
based on some condition. You can use karate.abort()
like so:
* if (responseStatus == 404) karate.abort()
Using karate.abort()
will not fail the test. Conditionally making a test fail is easy with karate.fail()
* if (condition) karate.fail('a custom message')
But normally a match
statement is preferred unless you want a really descriptive error message.
Also refer to polling for more ideas.
Since it is so easy to dive into Java-interop, Karate does not include any random-number functions, uuid generator or date / time utilities out of the box. You simply roll your own.
Here is an example of how to get the current date, and formatted the way you want:
* def getDate =
"""
function() {
var SimpleDateFormat = Java.type('java.text.SimpleDateFormat');
var sdf = new SimpleDateFormat('yyyy/MM/dd');
var date = new java.util.Date();
return sdf.format(date);
}
"""
* def temp = getDate()
* print temp
And the above will result in something like this being logged: [print] 2017/10/16
.
Here below are a few more common examples:
Utility | Recipe |
---|---|
System Time (as a string) | function(){ return java.lang.System.currentTimeMillis() + '' } |
UUID | function(){ return java.util.UUID.randomUUID() + '' } |
Random Number (0 to max-1 ) | function(max){ return Math.floor(Math.random() * max) } |
Case Insensitive Comparison | function(a, b){ return a.equalsIgnoreCase(b) } |
Sleep or Wait for pause milliseconds | function(pause){ java.lang.Thread.sleep(pause) } |
The first three are good enough for random string generation for most situations. Note that if you need to do a lot of case-insensitive string checks, karate.lowerCase()
is what you are looking for.
If you find yourself needing a complex helper or utility function, we strongly recommend that you use Java because it is much easier to maintain and even debug if needed. And if you need multiple functions, you can easily organize them into a single Java class with multiple static methods.
That said, if you want to stick to JavaScript, but find yourself accumulating a lot of helper functions that you need to use in multiple feature files, the following pattern is recommended.
You can organize multiple "common" utilities into a single re-usable feature file as follows e.g. common.feature
@ignore
Feature:
Scenario:
* def hello = function(){ return 'hello' }
* def world = function(){ return 'world' }
And then you have two options. The first option using shared scope should be fine for most projects, but if you want to "name space" your functions, use "isolated scope":
Scenario: function re-use, global / shared scope
* call read('common.feature')
* assert hello() == 'hello'
* assert world() == 'world'
Scenario: function re-use, isolated / name-spaced scope
* def utils = call read('common.feature')
* assert utils.hello() == 'hello'
* assert utils.world() == 'world'
You can even move commonly used routines into karate-config.js
which means that they become "global". But we recommend that you do this only if you are sure that these routines are needed in almost all *.feature
files. Bloating your configuration can lead to loss of performance, and maintainability may suffer.
The JS API has a karate.signal(result)
method that is useful for involving asynchronous flows into a test.
listen
You use the listen
keyword (with a timeout) to wait until that event occurs. The listenResult
magic variable will hold the value passed to the call to karate.signal()
.
This is best explained in this example that involves listening to an ActiveMQ / JMS queue.
Note how JS functions defined at run-time can be mixed with custom Java code to get things done. You need to use karate.toJava()
to "wrap" JS functions passed to custom Java code.
Background:
* def QueueConsumer = Java.type('mock.contract.QueueConsumer')
* def queue = new QueueConsumer(queueName)
* def handler = function(msg){ karate.signal(msg) }
* queue.listen(karate.toJava(handler))
* url paymentServiceUrl + '/payments'
Scenario: create, get, update, list and delete payments
Given request { amount: 5.67, description: 'test one' }
When method post
Then status 200
And match response == { id: '#number', amount: 5.67, description: 'test one' }
And def id = response.id
* listen 5000
* json shipment = listenResult
* print '### received:', shipment
* match shipment == { paymentId: '#(id)', status: 'shipped' }
JavaScript functions have some limitations when combined with multi-threaded Java code. So it is recommended that you directly use a Java Function
when possible instead of using the karate.toJava()
"wrapper" as shown above.
One pattern you can adopt is to create a "factory" method that returns a Java function - where you can easily delegate to the logic you want. For example, see the sayHelloFactory()
method below:
public class Hello {
public static String sayHello(String message) {
return "hello " + message;
}
public static Function<String, String> sayHelloFactory() {
return s -> sayHello(s);
}
}
And now, to get a reference to that "function" you can do this:
* def sayHello = Java.type('com.myco.Hello').sayHelloFactory()
This can be convenient when using shared scope because you can just call sayHello('myname')
where needed.
Karate also has built-in support for websocket that is based on the async capability. The following method signatures are available on the karate
JS object to obtain a websocket reference:
karate.webSocket(url)
karate.webSocket(url, handler)
karate.webSocket(url, handler, options)
- where options
is an optional JSON (or map-like) object that takes the following optional keys:subProtocol
- in case the server expects itheaders
- another JSON of key-value pairsmaxPayloadSize
- this defaults to 4194304 (bytes, around 4 MB)These will init a websocket client for the given url
and optional subProtocol
. If a handler
function (returning a boolean) is provided - it will be used to complete the "wait" of socket.listen()
if true
is returned - where socket
is the reference to the websocket client returned by karate.webSocket()
. A handler function is needed only if you have to ignore other incoming traffic. If you need custom headers for the websocket handshake, use JSON as the last argument.
Here is an example, where the same websocket connection is used to send as well as receive a message.
* def handler = function(msg){ return msg.startsWith('hello') }
* def socket = karate.webSocket(demoBaseUrl + '/websocket', handler)
* socket.send('Billie')
* def result = socket.listen(5000)
* match result == 'hello Billie !'
For handling binary messages, the same karate.webSocket()
method signatures exist for karate.webSocketBinary()
. Refer to these examples for more: echo.feature
| websocket.feature
. Note that any websocket instances created will be auto-closed at the end of the Scenario
.
Gherkin has a great way to sprinkle meta-data into test-scripts - which gives you some interesting options when running tests in bulk. The most common use-case would be to partition your tests into 'smoke', 'regression' and the like - which enables being able to selectively execute a sub-set of tests.
The documentation on how to run tests via the command line has an example of how to use tags to decide which tests to not run (or ignore). Also see first.feature
and second.feature
in the demos. If you find yourself juggling multiple tags with logical AND
and OR
complexity, refer to this Stack Overflow answer.
For advanced users, Karate supports being able to query for tags within a test, and even tags in a
@name=value
form. Refer tokarate.tags
andkarate.tagValues
.
For completeness, the "built-in" tags are the following:
Tag | Description |
---|---|
@ignore | Any Scenario with (or that has inherited) this tag will be skipped at run-time. This does not apply to anything that is "called" though |
@parallel | See @parallel=false |
@report | See @report=false |
@env | See below |
@envnot | See below |
There are two special tags that allow you to "select" or "un-select" a Scenario
depending on the value of karate.env
. This can be really convenient, for example to never run some tests in a certain "production like" or sensitive environment.
@env=foo,bar
- will run only when the value of karate.env
is not-null and equal to foo
or bar
@envnot=foo
- will run when the value of karate.env
is null
or anything other than foo
Here is an example:
@env=dev
Scenario: runs only when karate.env is 'dev'
* print 'karate.env is:', karate.env
Since multiple values are supported, you can also do this:
@envnot=perf,prod
Scenario: never runs in perf or prod
* print 'karate.env is:', karate.env
A little-known capability of the Cucumber / Gherkin syntax is to be able to tag even specific rows in a bunch of examples ! You have to repeat the Examples
section for each tag. The example below combines this with the advanced features described above.
Scenario Outline: examples partitioned by tag
* def vals = karate.tagValues
* match vals.region[0] == expected
@region=US
Examples:
| expected |
| US |
@region=GB
Examples:
| expected |
| GB |
Note that if you tag Examples
like this, and if a tag selector is used when running a given Feature
- only the Examples
that match the tag selector will be executed. There is no concept of a "default" where for e.g. if there is no matching tag - that the Examples
without a tag will be executed. But note that you can use the negative form of a tag selector: ~@region=GB
.
In situations where you start an (embedded) application server as part of the test set-up phase, a typical challenge is that the HTTP port may be determined at run-time. So how can you get this value injected into the Karate configuration ?
It so happens that the karate
object has a field called properties
which can read a Java system-property by name like this: karate.properties['myName']
. Since the karate
object is injected within karate-config.js
on start-up, it is a simple and effective way for other processes within the same JVM to pass configuration values to Karate at run-time. Refer to the 'demo' karate-config.js
for an example and how the demo.server.port
system-property is set-up in the test runner: TestBase.java
.
Karate has a set of Java API-s that expose the HTTP, JSON, data-assertion and UI automation capabilities. The primary classes are described below.
Http
- build and execute any HTTP request and retrieve responsesJson
- build and manipulate JSON data using JsonPath expressions, convert to and from Java Map
-s and List
-s, parse strings into JSON and convert Java objects into JSONMatch
- exposes all of Karate's match
capabilities, and this works for Java Map
and List
objectsDriver
- perform web-browser automationDo note that if you choose the Java API, you will naturally lose some of the test-automation framework benefits such as HTML reports, parallel execution and JavaScript / configuration. You may have to rely on unit-testing frameworks or integrate additional dependencies.
jbang is a great way for you to install and execute scripts that use Karate's Java API on any machine with minimal setup. Note that jbang itself is super-easy to install and there is even a "Zero Install" option.
Here below is an example jbang script that uses the Karate Java API to do some useful work. Name the file as javadsl.java
and run using the command: jbang javadsl.java
.
please replace
RELEASE
with the exact version of Karate you intend to use if applicable
///usr/bin/env jbang "$0" "$@" ; exit $?
//DEPS com.intuit.karate:karate-core:RELEASE:all
import com.intuit.karate.*;
import java.util.List;
public class javadsl {
public static void main(String[] args) {
List users = Http.to("https://jsonplaceholder.typicode.com/users")
.get().json().asList();
Match.that(users.get(0)).contains("{ name: 'Leanne Graham' }");
String city = Json.of(users).get("$[0].address.city");
Match.that("Gwenborough").isEqualTo(city);
System.out.println("\n*** second user: " + Json.of(users.get(1)).toString());
}
}
Read the documentation of the stand-alone JAR for more - such as how you can even install custom command-line applications using jbang !
It is also possible to invoke a feature file via a Java API which can be useful in some test-automation situations.
A common use case is to mix API-calls into a larger test-suite, for example a Selenium or WebDriver UI test. So you can use Karate to set-up data via API calls, then run the UI test-automation, and finally again use Karate to assert that the system-state is as expected. Note that you can even include calls to a database from Karate using Java interop. And this example may make it clear why using Karate itself to drive even your UI-tests may be a good idea.
The static method com.intuit.karate.Runner.runFeature()
is best explained in this demo unit-test: JavaApiTest.java
.
You can optionally pass in variable values or over-ride config via a HashMap
or leave the second-last argument as null
. The variable state after feature execution would be returned as a Map<String, Object>
. The last boolean
argument is whether the karate-config.js
should be processed or not. Refer to the documentation on type-conversion to make sure you can 'unpack' data returned from Karate correctly, especially when dealing with XML.
If you are looking for Cucumber 'hooks' Karate does not support them, mainly because they depend on Java code, which goes against the Karate Way™.
Instead, Karate gives you all you need as part of the syntax. Here is a summary:
To Run Some Code | How |
---|---|
Before everything (or 'globally' once) | See karate.callSingle() |
Before every Scenario | Use the Background . Note that karate-config.js is processed before every Scenario - so you can choose to put "global" config here, for example using karate.configure() . |
Once (or at the start of) every Feature | Use a callonce in the Background . The advantage is that you can set up variables (using def if needed) which can be used in all Scenario -s within that Feature . |
After every Scenario | configure afterScenario (see example) |
At the end of the Feature | configure afterFeature (see example) |
Note that for the
afterFeature
hook to work, you should be using theRunner
API and not the JUnit runner.
karate.callSingle()
Only recommended for advanced users, but this guarantees a routine is run only once, even when running tests in parallel. You can use karate.callSingle()
in karate-config.js
like this:
var result = karate.callSingle('classpath:some/package/my.feature');
It can take a second JSON argument following the same rules as call
. Once you get a result, you typically use it to set global variables.
Refer to this example:
You can use karate.callSingle()
directly in a *.feature
file, but it logically fits better in the global "bootstrap". Ideally it should return "pure JSON" and note that you always get a "deep clone" of the cached result object.
IMPORTANT: There are some restrictions when using callonce
or karate.callSingle()
especially within karate-config.js
. Ideally you should return only pure JSON data (or a primitive string, number etc.). Keep in mind that the reason this exists is to "cache" data, and not behavior. So if you return complex objects such as a custom Java instance or a JS function that depends on complex objects, this may cause issues when you run in parallel. If you really need to re-use a Java function, see Java Function References.
configure callSingleCache
When re-running tests in development mode and when your test suite depends on say an Authorization
header set by karate.callSingle()
, you can cache the results locally to a file, which is very convenient when your "auth token" is valid for a period of a few minutes - which typically is the case. This means that as long as the token "on file" is valid, you can save time by not having to make the one or two HTTP calls needed to "sign-in" or create "throw-away" users in your SSO store.
So in "dev mode" you can easily set this behavior like this. Just ensure that this is "configured" before you use karate.callSingle()
:
if (karate.env == 'local') {
karate.configure('callSingleCache', { minutes: 15 });
}
By default Karate will use target
(or build
) as the "cache" folder, which you can over-ride by adding a dir
key:
karate.configure('callSingleCache', { minutes: 15, dir: 'some/other/folder' });
This caching behavior will work only if the result of
karate.callSingle()
is a JSON-like object, and any JS functions or Java objects mixed in will be lost.
Cucumber has a concept of Scenario Outlines where you can re-use a set of data-driven steps and assertions, and the data can be declared in a very user-friendly fashion. Observe the usage of Scenario Outline:
instead of Scenario:
, and the new Examples:
section.
You should take a minute to compare this with the exact same example implemented in REST-assured and TestNG. Note that this example only does a "string equals" check on parts of the JSON, but with Karate you are always encouraged to match the entire payload in one step.
Feature: karate answers 2
Background:
* url 'http://localhost:8080'
Scenario Outline: given circuit name, validate country
Given path 'api/f1/circuits/<name>.json'
When method get
Then match $.MRData.CircuitTable.Circuits[0].Location.country == '<country>'
Examples:
| name | country |
| monza | Italy |
| spa | Belgium |
| sepang | Malaysia |
Scenario Outline: given race number, validate number of pitstops for Max Verstappen in 2015
Given path 'api/f1/2015/<race>/drivers/max_verstappen/pitstops.json'
When method get
Then assert response.MRData.RaceTable.Races[0].PitStops.length == <stops>
Examples:
| race | stops |
| 1 | 1 |
| 2 | 3 |
| 3 | 2 |
| 4 | 2 |
This is great for testing boundary conditions against a single end-point, with the added bonus that your test becomes even more readable. This approach can certainly enable product-owners or domain-experts who are not programmer-folk, to review, and even collaborate on test-scenarios and scripts.
Karate has enhanced the Cucumber Scenario Outline
as follows:
Examples
column header has a !
appended, each value will be evaluated as a JavaScript data-type (number, boolean, or even in-line JSON) - else it defaults to string.__row
gives you the entire row as a JSON object, and __num
gives you the row index (the first row is 0
).__row
, each column key-value will be available as a separate variable, which greatly simplifies JSON manipulation - especially when you want to re-use JSON files containing embedded expressions.null
value for that column-key, and this can be useful to remove nodes from JSON or XML documentsThese are best explained with examples. You can choose between the string-placeholder style <foo>
or directly refer to the variable foo
(or even the whole row JSON as __row
) in JSON-friendly expressions.
Note that even the scenario name can accept placeholders - which is very useful in reports.
Scenario Outline: name is <name> and age is <age>
* def temp = '<name>'
* match temp == name
* match temp == __row.name
* def expected = __num == 0 ? 'name is Bob and age is 5' : 'name is Nyan and age is 6'
* match expected == karate.scenario.name
Examples:
| name | age |
| Bob | 5 |
| Nyan | 6 |
Scenario Outline: magic variables with type hints
* def expected = [{ name: 'Bob', age: 5 }, { name: 'Nyan', age: 6 }]
* match __row == expected[__num]
Examples:
| name | age! |
| Bob | 5 |
| Nyan | 6 |
Scenario Outline: embedded expressions and type hints
* match __row == { name: '#(name)', alive: '#boolean' }
Examples:
| name | alive! |
| Bob | false |
| Nyan | true |
Scenario Outline: inline json
* match __row == { first: 'hello', second: { a: 1 } }
* match first == 'hello'
* match second == { a: 1 }
Examples:
| first | second! |
| hello | { a: 1 } |
For another example, see: examples.feature
.
If you're looking for more complex ways of dynamically naming your scenarios you can use JS string interpolation by including placeholders in your scenario name.
Scenario Outline: name is ${name.first} ${name.last} and age is ${age}
* match name.first == "#? _ == 'Bob' || _ == 'Nyan'"
* match name.last == "#? _ == 'Dylan' || _ == 'Cat'"
* match title == karate.scenario.name
Examples:
| name! | age | title |
| { "first": "Bob", "last": "Dylan" } | 10 | name is Bob Dylan and age is 10 |
| { "first": "Nyan", "last": "Cat" } | 5 | name is Nyan Cat and age is 5 |
String interpolation will support variables in scope and / or the Examples
(including functions defined globally, but not functions defined in the background). Even Java interop and access to the karate
JS API would work.
For some more examples check test-outline-name-js.feature
.
The limitation of the Cucumber Scenario Outline:
(seen above) is that the number of rows in the Examples:
is fixed. But take a look at how Karate can loop over a *.feature
file for each object in a JSON array - which gives you dynamic data-driven testing, if you need it. For advanced examples, refer to some of the scenarios within this demo: dynamic-params.feature
.
Also see the option below, where you can data-drive an Examples:
table using JSON.
You can feed an Examples
table from a custom data-source, which is great for those situations where the table-content is dynamically resolved at run-time. This capability is triggered when the table consists of a single "cell", i.e. there is exactly one row and one column in the table.
This technique has one caveat to be aware of regarding isolation of tests running in parallel. The Background
section is only run once in order to set up the list of dynamic scenarios. This means that any other steps within the Background
are not repeated for each individual example. This is different behaviour from normal scenarios where each Scenario
also runs the Background
steps.
The "scenario expression" result is expected to be an array of JSON objects. Here is an example (also see this video):
Feature: scenario outline using a dynamic table
Background:
* def kittens = read('../callarray/kittens.json')
Scenario Outline: cat name: <name>
Given url demoBaseUrl
And path 'cats'
And request { name: '#(name)' }
When method post
Then status 200
And match response == { id: '#number', name: '#(name)' }
# the single cell can be any valid karate expression
# and even reference a variable defined in the Background
Examples:
| kittens |
The great thing about this approach is that you can set-up the JSON array using the Background
section. Any Karate expression can be used in the "cell expression", and you can even use Java-interop to use external data-sources such as a database. Note that Karate has built-in support for CSV files and here is an example: dynamic-csv.feature
.
An advanced option is where the "scenario expression" returns a JavaScript "generator" function. This is a very powerful way to generate test-data without having to load a large number of data rows into memory. The function has to return a JSON object. To signal the end of the data, just return null
. The function argument is the row-index, so you can easily determine when to stop the generation of data. Here is an example:
Feature: scenario outline using a dynamic generator function
Background:
* def generator = function(i){ if (i == 20) return null; return { name: 'cat' + i, age: i } }
Scenario Outline: cat name: <name>
Given url demoBaseUrl
And path 'cats'
And request { name: '#(name)', age: '#(age)' }
When method post
Then status 200
And match response == { id: '#number', name: '#(name)' }
Examples:
| generator |
Author: karatelabs
Source code: https://github.com/karatelabs/karate
License: MIT license
#java #testing