Tia  Gottlieb

Tia Gottlieb


Diving into Your Documents with DocAI

We recently announced the GA of the Document AI Platform, Google’s solution for automating and validating documents to streamline document workflows. Important business data is not always readily available in computer-readable formats. This is what we consider dark formats such as pdfs, handwritten forms and images.

The platform is a console for document processing where customers can quickly access all parsers, tools, and solutions. Workflow solutions, built on our specialized parsers with models for common enterprise document types such tax forms, invoices, receipts and more, Lending DocAI and Procurement DocAI are now also in GA.


So why use it? Your business is most likely sitting on a treasure trove of unstructured data, or maybe you have document workflows that require several manual steps. DocAI can help you programmatically extract data for gathering insights with data analytics and help automate tedious and error-prone tasks. Use one of our client libraries to ingest your documents and produce structured data in our new unified document format.

#ai #artificial-intelligence #machine-learning #cloud #developer

What is GEEK

Buddha Community

Diving into Your Documents with DocAI

How to Automate Excel Via PowerShell without installing Excel

PowerShell + Excel = Better Together

Automate Excel via PowerShell without having Excel installed. Runs on Windows, Linux and MAC. Creating Tables, Pivot Tables, Charts and much more has just become a lot easier.


Open ImportExcel as a remote repo in VS Code, without cloning it.

Open in Visual Studio Code


CI SystemEnvironmentStatus
Azure DevOpsWindowsBuild Status
Azure DevOpsWindows (Core)Build Status
Azure DevOpsUbuntuBuild Status
Azure DevOpsmacOSBuild Status


Install from the PowerShell Gallery.

Install-Module -Name ImportExcel


If this project helped you reduce the time to get your job done, let me know, send a coffee.


How to Videos

Installation -

PowerShell V5 and Later

You can install the ImportExcel module directly from the PowerShell Gallery

[Recommended] Install to your personal PowerShell Modules folder

Install-Module ImportExcel -scope CurrentUser

[Requires Elevation] Install for Everyone (computer PowerShell Modules folder)

Install-Module ImportExcel

Continuous Integration Updates

Big thanks to Illy for taking the Azure DevOps CI to the next level. Improved badges, improved matrix for cross platform OS testing and more.

Plus, wiring the PowerShell ScriptAnalyzer Excel report we built into each run as an artifact.

What's new 7.1.3

What's new 7.1.2

  • Get-ExcelFileSummary - Gets summary information on an Excel file like number of rows, columns, and more
dir . -r *.xlsx | Get-ExcelFileSummary | ft

ExcelFile          WorksheetName Rows Columns Address Path
---------          ------------- ---- ------- ------- ----
Grades.xlsx        Sheet1          21       3 A1:C21  D:\temp\ExcelYouTube\Grades
GradesAverage.xlsx Sheet1          21       5 A1:E21  D:\temp\ExcelYouTube\Grades
AllShifts.xlsx     Sheet1          21       2 A1:B21  D:\temp\ExcelYouTube\SeparateData
Shift_1.xlsx       Sheet1          10       2 A1:B10  D:\temp\ExcelYouTube\SeparateData
Shift_2.xlsx       Sheet1           8       2 A1:B8   D:\temp\ExcelYouTube\SeparateData
Shift_3.xlsx       Sheet1           5       2 A1:B5   D:\temp\ExcelYouTube\SeparateData
Shifts.xlsx        Shift_1         10       2 A1:B10  D:\temp\ExcelYouTube\SeparateData
Shifts.xlsx        Shift_2          8       2 A1:B8   D:\temp\ExcelYouTube\SeparateData

What's new 7.1.1

  • Merged Nate Ferrell's Linux fix. Thanks!
  • Moved Export-MultipleExcelSheets from psm1 to Examples/Experimental
  • Deleted the CI build in Appveyor
  • Thank you Mikey Bronowski for
    • Multiple sweeps
    • Standardising casing of parameter names, and variables
    • Plus updating > 50 of the examples and making them consistent.

What's new 7.1.0

Fixes, Updates and new Examples



  • Add -AsDate support to Import-Excel and ConvertFrom-ExcelSheet

New Examples

Pester-To-XLSxRuns Pester, collects the results, enriches it, and exports it to ExcelPester-To-XLSx.ps1
DSUMSums up the numbers in a field (column) of records in a list or database that match conditions that you specify.DSUM.ps1
VLookupSetups up a sheet, you enter the name of an item and the amount is looked upVLOOKUP.ps1

What's new 7.0.1

More infrastructure improvements.

  • Refine pipeline script analysis
  • Improve artifacts published
  • Add manifest (psd1) checks

What's new 7.0.0


  • Remove all functions from the psm1
  • Move functions into public subdirectory
  • Align TDD and continuous integration workflow for this refactor
  • Move help from functions to mdHelp and use PlatyPS to generate external help file

Thanks to James O'Neill for the refactor and Illy on the continuous integration.

What's new 6.5.3

Thanks again to the community for making this module even better.

  • Fix import excel headers
  • Numerous improvements for DataTables and exporting it to Excel James O'Neill
    • Names, styles, proper appending
  • Handles marking the empty row on an empty table as dummy row
  • Re-work code based on linting recommendations
  • Update existing tests and add more
  • Support PipelineVariable thanks to Luc Dekens for reporting and Ili for the PR
  • Fix quoting in ConvertFromExcelToSQLInsert beckerben

What's new 6.5.2

Thank you uSlackrill

  • Fixes Column order issue (plus tests) for Get-ExcelColumnName

Thank you jhoneill

  • Added -Force to Send-SQLDataToExcel so it sends something even if no rows are returned. (see #703)
  • Added -asText to import-Excel see (#164)[https://github.com/dfinke/ImportExcel/issues/164] and multiple others
  • Linux. Now set an environment variable if the support needed for Autosize is present, and use that Environment variable to decide to skip autosize operations.
  • Fixed tests which needed autosize to work so they skip of the environment variable is set.
  • Fixed another break where on azure the module never loaded.
  • Add a comment to ci.ps1 re better .NET version detection and left some commented out code.


What's new 6.5.0

This is now using the latest version of EPPlus. Unit tests are updated and passing, if you hit problems, please open an issue. You can rollback to an older version from the PowerShell Gallery if you are blocked.

  • Unit tests were updated and fixed
  • "Set-WorksheetProtection" is now switched on
  • Made a change to make Set-Excel range more friendly when Auto Sizing on non-windows platforms
  • Fixed - Windows only tests don't attempt to run on non-windows systems
  • Tests based on Get-Process don't attempt to run if <20 processes are returned
  • If $env:TEMP is not set (as will be the case on Linux)
  • Join-Path if used so paths are built with / or with as suits the OS where the test is running.
  • Excel Sparklines now supported, check out the examples SalesByQuarter and Sparklines.

What's new 6.2.4

Sensible parameter defaults, make your life easier and gets things done faster.

  • Thank you to DomRRuggeri for the initial Out-Excel PR and kicking off the conversation on the improvements.
  • Thank you to ili101 for refactoring and improving the defaults, and adding the tests for parameters.
  • Creates a table, with filtering
  • Chooses a TableStyle
  • Displays the Excel spreadsheet automatically
Get-Process | select Company, Name, Handles | Export-Excel


What's new 6.2.3

Thank you jhoneill.

  • Refactored copy sheet and added pipe support
  • Add ClearAll to Set-ExcelRange
  • Fix broken test & regression for passwords
    • Note: Passwords do not work on pwsh. The EPPlus library does not support these dotnet core APIs at this time.

What's new 6.2.2

  • Fixed Import-Excel and relative path issue, added unit tests.

What's new 6.2.0

Thank you to James O'Neill

  • Fixed, Import-Excel can read xlsx files even if already open in Excel
  • Added New-ExcelStyle, plus -Style to Export-Excel and -Merge to Set-ExcelRange
  • Added Style Examples

What's new 6.1.0

Thank you to James O'Neill

  • Instead of specifying a path provides an Excel Package object (from Open-ExcelPackage), using this avoids re-reading the whole file when importing multiple parts of it. To allow multiple read operations Import-Excel does NOT close the package, and you should use Close-ExcelPackage -noSave to close it.

What's new 6.0.0

Thank you to James O'Neill for the optimizations, and refactoring leading to a ~10x speed increase. Thanks to ili101 for earlier PRs that provided the ground work for this.

  • Performance improvement to Export-Excel see #506 and #555. This has meant taking code in Add-CellValue back into process block of Export-Excel, as the overhead of calling the function was a lot greater than time executing the code inside it. Blog post to follow. Some tests are showing a ~10x speed increase. #572 was about a broken #region tag in this part of the code and that has been cleaned up in the process.
  • Export-Excel now has an -InputObject parameter (this was previously -TargetData , which is now an alias for InputObject). If the inputobject is an array, each item will be inserted, so you can run export-excel -inputobject $x rather than $x | Export-Excel, and if it is a system.data.datatable object it will be inserted directly rather than cell-by-cell. Send-SQLDataToExcel takes advantage of this new functionality. There are simple tests for these new items
  • Export-Excel previously assumed -Now if there were no other parameters, it will now assume -Now if there is no -Path or -ExcelPackage. The .PSD1 file now itemizes the items exported by the module #557

What's new 5.4.5

Thank you to James O'Neill for the great additions.

  • Modified Send-SQLDataToExcel so it creates tables and ranges itself; previously it relied on export-excel to do this which cause problems when adding data to an existing sheet (#555)
  • Added new command Add-ExcelDataValidation which will apply different data-validation rules to ranges of cells
  • Changed the export behavior so that (1) attempts to convert to a number only apply if the the value was a string; (2) Nulls are no longer converted to an empty string (3) there is a specific check for URIs and not just text which is a valid URI. Using UNC names in hyperlinks remains problematic.
  • Changed the behavior of AutoSize in export excel so it only applies to the exported columns. Previously if something was exported next to pre-existing data, AutoSize would resize the whole sheet, potentially undoing things which had been set on the earlier data. If anyone relied on this behavior they will need to explicitly tell the sheet to auto size with $sheet.cells.autofitColumns. (where $sheet points to the sheet, it might be $ExcelPackage.Workbook.Worksheets['Name'])
  • In Compare-Worksheet,the Key for comparing the sheets can now be written as a hash table with an expression - it is used with a Group-Object command so if it is valid in Group-Object it should be accepted; this allows the creation of composite keys when data being compared doesn't have a column which uniquely identifies rows.
  • In Set-ExcelRange , added a 'Locked' option equivalent to the checkbox on the Protection Tab of the format cells dialog box in Excel.
  • Created a Set-WorksheetProtection function. This gives the same options the protection dialog in Excel but is 0.9 release at the moment.

New Example

What's new 5.4.4

  • Fix issue when only a single property is piped into Export-Excel
  • Fix issue in Copy-ExcelWorksheet, close the $Stream

What's new 5.4.3

  • Added Remove-Worksheet: Removes one or more worksheets from one or more workbooks

What's new 5.4.2

Added parameters -GroupDateRow and -GroupDatePart & -GroupNumericRow, -GroupNumericMin, -GroupNumericMax and -GroupNumericInterval

to Add-PivotTable and New-PivotTableDefinition. The date ones gather dates of the same year and/or quarter and/or month and/or day etc.

the number ones group numbers into bands, starting at Min, and going up steps specified by Interval. Added tests and help for these.

Set-ExcelRow and Set-ExcelColumn now check that the worksheet name they passed exists in the workbook.

What's new 5.4.0

  • Thank you to Conrad Agramont, Twitter: @AGramont for the AddMultiWorkSheet.ps1 example. Much appreciated!
  • Fixed several more bugs where parameters were ignored if passed a zero value
  • Fixed bug where chart series headers could not come form a cell reference (=Sheet1!Z10 now works as a header reference)
  • Add-Chart will now allow a single X range, or as many X ranges as there are Y ranges.
  • Merge-MultipleSheets is more robust.
  • Set-ExcelRow and Set-ExcelColumn trap attempts to process a sheet with no rows/columns.
  • Help has been proof-read (thanks to Mrs. @Jhoneill !).

What's new 5.3.4

What's new 5.3.3

  • Thank you to (lazywinadmin)[https://github.com/lazywinadmin] - Expand aliases in examples and elsewhere
  • In Export-Excel fixed a bug where -AutoNameRange on pre-existing data included the header in the range.
  • In Export-Excel fixed a bug which caused a zero, null, or empty string in a list of simple objects to be skipped.
  • In Export-Excel improved the behaviour when a new worksheet is created without data, and Tables etc are added to it.
  • In Join-Worksheet: added argument completer to -TitleBackgroundColor and set default for -TitleBackgroundStyle to 'Solid'.
  • In Add-Excel chart, New-ExcelChart, tests and Examples fixed mis-spelling of "Position"
  • In Send-SqlDataToExcel: improved robustness of check for no data returned.
  • In Set-ExcelColumn: -column can come from the pipeline (supporting an array introduces complications for supporting script blocks); -AutoNameRange no longer requires heading to specified (so you can do 1..10 | Set-ExcelColumn -AutoNameRange ); In Set-ExcelRow: -Row can come from the pipeline
  • Improved test coverage (back over 80%).
  • Help and example improvements. In "Index - music.ps1" the module for querying the index can be downloaded from PowerShell gallery #requires set to demand it. In SQL+FillColumns+Pivot\example2.ps1 the GetSQL module can be downloaded and #Requires has been set. The F1 results spreadsheet is available from one drive and a link is provided.
  • Added Azure DevOps continuous integration and badges

What's new in Release 5.3

  • Help improvements and tidying up of examples and extra examples
  • Open-Excel Package and Add-Worksheet now add worksheets as script properties so $Excel = Open-ExcelPackage -path test.xlsx ; $excel.sheet1 will return the sheet named "sheet1" $Excel.SheetName is a script property which is defined as $this.workbook.worksheets["Sheetname"]
  • Renamed Set-Column to Set-ExcelColumn, Set-Row to Set-ExcelRow, and Set-Format, to Set-ExcelRange. Added aliases so the old names still work.
  • Set-ExcelRange (or set-Format) used "Address" and "Range" incorrectly. There is now a single parameter -Range, with an alias of "Address". If the worksheet parameter is present, the function accepts a string specifying cells ("A1:Z10") or a the name of range. Without the worksheet it accepts an object representing a named range or a Table; or a tables's address, or part of the worksheet.cells collection.
  • Add-ConditionalFormatting: Used "address" correctly, and it will accept ranges in the address parameter (range is now an alias for address). It now wraps conditional value strings in quotes when needed (for = <= >= operations string needs to be in double quotes see issue #424). Parameter intellisense has been improved. There are new parameters: -StopIfTrue and -Priority and support for using the -Reverse parameter with Color-scale rules (issue #430). Booleans in the sheet are now supported as the value for a condition. Also brought the two different kinds of condition together inside Export-Excel, and fixed a bug where named-ranges didn't work in some places. In New-ConditionalText, more types of conditional format are supported, and the argument completer for -ConditionalTextColor was missing and has been added.
  • Improved handling of hyperlinks in Export-Excel (see issue #426)s
  • Export-Excel has better checking of Table and PivotTable names (for uniqueness) and a new test in quick charts that there is suitable data for charting. It also accepts hash tables for chart, pivot table and conditional formatting parameters which are splatted into the functions which add these.
  • Moved logic for adding a named-range out of Export-Excel and into a new function named Add-ExcelName, and logic for adding a table into a function named Add-ExcelTable; this is to make it easier to do these things independently of Export-Excel, but minimize duplication. The Add-ExcelTable command has extra parameters to toggle the options from table tools toolbar (show totals etc.) and set options in the totals row.
  • Moved PivotTable Functions out of Export-Excel.PS1 into their own file and moved Add-ExcelChart out of Export-Excel.ps1 into New-ExcelChart.ps1
  • Fixed bug in Merge-MultipleSheets where background pattern was set to None, making background color invisible.
  • Fixed issues where formatting could be reset when using Export-Excel to manipulate an existing sheet without appending data; this applied to number-formats and tables.
  • Add-PivotTable has some new parameters -PassThru returns the pivot table (e.g. to allow names /sort orders of data series to be tweaked ) -Address allows Pivot to be placed on an existing sheet; -PivotTableStyle allows a change from "Medium6", -PivotNumberFormat formats data cells. It is more flexible about how the source data is specified - copying the range options in Set-ExcelRange. Add-ExcelChart is now used for creating PivotCharts, and -PivotChartDefinition allows a definition created with New-ExcelChartDefinition to be used when setting up a PivotTable. This opens up all the things that Add-ExcelChart can do without duplicating the parameters on Add-Pivot table and Export-Excel. Definition, TableStyle, Numberformat and ChartDefiniton can be used in New-PivotTableDefinition .
  • Add-ExcelChart now supports -PassThru to return the chart for tweaking after creation; there is now a -PivotTable parameter to allow Add-PivotTable to call the code in Add-ExcelChart. And in New-ExcelChartDefinition Legend parameters (for size, bold & position ) are now supported
  • ChartDefinition and conditional formatting parameters can now be hashtables - anything that splats Add-ExcelChart or Add-ConditionalFormatting, it should be acceptable as a definition.

What's new in Release 5.2

  • Value does not need to be mandatory in Set-Row or Set-Column, also tidied their parameters a little.
  • Added support for array formulas in Set-Format (it really should be set range now that it sets values, formulas and hyperlinks - that can go on the to-do list )
  • Fixed a bug with -Append in Export-Excel which caused it to overwrite the last row if the new data was a simple type.
  • NumberFormat in Export-Excel now sets the default for on a new / blank sheet; but [still] sets individual cells when adding to a sheet
  • Added support for timespans in Export excel ; set as elapsed hours, mins, secs [h]:mm:sss
  • In Export-Excel improved the catch-all handler for insuring values to cope better with nested objects (#419) and reduce the number of parse operations
  • Added -Calculate switch to Export-Excel and Close-Excel Package; EPPlus needs formulas to OMIT the leading = sign so where formula is set it now strips a leading = sign
  • Added -PivotTotals parameter where there was already -NoTotalsInPivot new one allows None, Both, Rows, Columns. (#415)
  • When appending Export-Excel only extended tables and ranges if they were explicitly specified. It now does it automatically.
  • Compare and Merge worksheet originally had a problem with > 26 columns, I fixed merge turns out I hadn't fixed compare ... I have now
  • Fixed bug where Export-Excel would not recognize it had to set $TitleFillPattern - made the default 'Solid'
  • ExcludeProperty in Export-Excel now supports wildcards.
  • Added DateTime to the list of types which can be exported as single column.
  • Added Password support to Open- and Close-ExcelPackage (password was not doing anything in Export-Excel)
  • Gave Expand-NumberFormat a better grasp of currency layouts - it follows .NET which is not always the same as Excel would set:-(

What's new in Release 5.1.1

  • Set-Row and Set-Column will now create hyperlinks and insert dates correctly
  • Import-Excel now has an argument completer for Worksheet name - this can be slow on large files
  • The NumberFormat parameter (in Export-Excel, Set-Row, Set-Column, Set-Format and Add-ConditionalFormat) and X&YAxisNumberFormat parameters (in New-ExcelChartDefinition and Add-ExcelChart) now have an argument completer and the names Currency, Number, Percentage, Scientific, Fraction, Short Date ,Short time,Long time, Date-Time and Text will be converted to the correct Excel formatting strings.
  • Added new function Select-Worksheet to make a named sheet active: Added -Activate switch to Add-Worksheet, to make current sheet active, Export-Excel and Add-PivotTable support -Activate and pass it to Add-Worksheet, and New-PivotTableDefinition allows it to be part of the Pivot TableDefinition.
  • Fixed a bug in Set-Format which caused -Hidden not to work
  • Made the same changes to Add-Conditional format as set format so -switch:$false is processed, and 0 enums and values are processed correctly
  • In Export-Excel, wrapped calls to Add-CellValue in a try catch so a value which causes an issue doesn't crash the whole export but generates a warning instead (#410) .
  • Additional tests.

What's new to July 18

  • Changed parameter evaluation in Set-Format to support -bold:$false (and other switches so that if false is specified the attribute will be removed ), and to bug were enums with a value of zero, and other zero parameters were not set.
  • Moved chart creation into its own function (Add-Excel chart) within Export-Excel.ps1. Renamed New-Excelchart to New-ExcelChartDefinition to make it clearer that it is not making anything in the workbook (but for compatibility put an alias of New-ExcelChart in so existing code does not break). Found that -Header does nothing, so it isn't Add-Excel chart and there is a message that does nothing in New-ExcelChartDefinition .
  • Added -BarChart -ColumnChart -LineChart -PieChart parameters to Export-Excel for quick charts without giving a full chart definition.
  • Added parameters for managing chart Axes and legend
  • Added some chart tests to Export-Excel.tests.ps1. (but tests & examples for quick charts , axes or legends still on the to do list )
  • Fixed some bad code which had been checked-in in-error and caused adding charts to break. (This was not seen outside GitHub #377)
  • Added "Reverse" parameter to Add-ConditionalFormatting ; and added -PassThru to make it easier to modify details of conditional formatting rules after creation (#396)
  • Refactored ConditionalFormatting code in Export excel to use Add-ConditionalFormatting.
  • Rewrote Copy-ExcelWorksheet to use copy functionality rather than import | export (395)
  • Found sorts could be inconsistent in Merge-MultipleWorksheet, so now sort on more columns.
  • Fixed a bug introduced into Compare-Worksheet by the change described in the June changes below, this meant the font color was only being set in one sheet, when a row was changed. Also found that the PowerShell ISE and shell return Compare-Object results in different sequences which broke some tests. Applied a sort to ensure things are in a predictable order. (#375)
  • Removed (2) calls to Get-ExcelColumnName (Removed and then restored function itself)
  • Fixed an issue in Export-Excel where formulas were inserted as strings if "NoNumberConversion" is applied (#374), and made sure formatting is applied to formula cells
  • Fixed an issue with parameter sets in Export-Excel not being determined correctly in some cases (I think this had been resolved before and might have regressed)
  • Reverted the [double]::tryParse in export excel to the previous (longer) way, as the shorter way was not behaving correctly with with the number formats in certain regions. (also #374)
  • Changed Table, Range and AutoRangeNames to apply to whole data area if no data has been inserted OR to inserted data only if it has.(#376) This means that if there are multiple inserts only inserted data is touched, rather than going as far down and/or right as the furthest used cell. Added a test for this.
  • Added more of the Parameters from Export-Excel to Join-worksheet, join just calls export-excel with these parameters so there is no code behind them (#383)
  • Added more of the Parameters from Export-Excel to Send-SQLDataToExcel, send just calls export-excel with these parameters...
  • Added support for passing a System.Data.DataTable directly to Send-SQLDataToExcel
  • Fixed a bug in Merge-MultipleSheets where if the key was "name", columns like "displayName" would not be processed correctly, nor would names like "something_ROW". Added tests for Compare, Merge and Join Worksheet
  • Add-Worksheet , fixed a regression with move-after (#392), changed way default worksheet name is decided, so if none is specified, and an existing worksheet is copied (see June additions) and the name doesn't already exist, the original sheet name will be kept. (#393) If no name is given an a blank sheet is created, then it will be named sheetX where X is the number of the sheet (so if you have sheets FOO and BAR the new sheet will be Sheet3).

New in June 18

  • New commands - Diff , Merge and Join

Compare-Worksheet (introduced in 5.0) uses the built in Compare-object command, to output a command-line DIFF and/or color the worksheet to show differences. For example, if my sheets are Windows services the extra rows or rows where the startup status has changed get highlighted

Merge-Worksheet (also introduced in 5.0) joins two lumps, side by highlighting the differences. So now I can have server A's services and Server Bs Services on the same page. I figured out a way to do multiple sheets. So I can have Server A,B,C,D on one page :-) that is Merge-MultpleSheets

For this release I've fixed heaven only knows how many typos and proof reading errors in the help for these two, the only code change is to fix a bug if two worksheets have different names, are in different files and the Comparison sends the delta in the second back before the one in first, then highlighting changed properties could throw an error. Correcting the spelling of Merge-MultipleSheets is potentially a breaking change (and it is still plural!)

also fixed a bug in compare worksheet where color might not be applied correctly when the worksheets came from different files and had different name.

Join-Worksheet is new for this release. At it's simplest it copies all the data in Worksheet A to the end of Worksheet B

  • Add-Worksheet
    • I have moved this from ImportExcel.psm1 to ExportExcel.ps1 and it now can move a new worksheet to the right place, and can copy an existing worksheet (from the same or a different workbook) to a new one, and I set the Set return-type to aid intellisense
  • New-PivotTableDefinition
    • Now Supports -PivotFilter and -PivotDataToColumn, -ChartHeight/width -ChartRow/Column, -ChartRow/ColumnPixelOffset parameters
  • Set-Format
    • Fixed a bug where the -address parameter had to be named, although the examples in export-excel help showed it working by position (which works now. )
  • Export-Excel
    • I've done some re-factoring
      1. I "flattened out" small "called-once" functions , add-title, convert-toNumber and Stop-ExcelProcess.
      2. It now uses Add-Worksheet, Open-ExcelPackage and Add-ConditionalFormat instead of duplicating their functionality.
      3. I've moved the PivotTable functionality (which was doubled up) out to a new function "Add-PivotTable" which supports some extra parameters PivotFilter and PivotDataToColumn, ChartHeight/width ChartRow/Column, ChartRow/ColumnPixelOffsets.
      4. I've made the try{} catch{} blocks cover smaller blocks of code to give a better idea where a failure happened, some of these now Warn instead of throwing - I'd rather save the data with warnings than throw it away because we can't add a chart. Along with this I've added some extra write-verbose messages
    • Bad column-names specified for Pivots now generate warnings instead of throwing.
    • Fixed issues when pivot tables / charts already exist and an export tries to create them again.
    • Fixed issue where AutoNamedRange, NamedRange, and TableName do not work when appending to a sheet which already contains the range(s) / table
    • Fixed issue where AutoNamedRange may try to create ranges with an illegal name.
    • Added check for illegal characters in RangeName or Table Name (replace them with "_"), changed tablename validation to allow spaces and applied same validation to RangeName
    • Fixed a bug where BoldTopRow is always bolds row 1 even if the export is told to start at a lower row.
    • Fixed a bug where titles throw pivot table creation out of alignment.
    • Fixed a bug where Append can overwrite the last rows of data if the initial export had blank rows at the top of the sheet.
    • Removed the need to specify a fill type when specifying a title background color
    • Added MoveToStart, MoveToEnd, MoveBefore and MoveAfter Parameters - these go straight through to Add worksheet
    • Added "NoScriptOrAliasProperties" "DisplayPropertySet" switches (names subject to change) - combined with ExcludeProperty these are a quick way to reduce the data exported (and speed things up)
    • Added PivotTableName Switch (in line with 5.0.1 release)
    • Add-CellValue now understands URI item properties. If a property is of type URI it is created as a hyperlink to speed up Add-CellValue
      • Commented out the write verbose statements even if verbose is silenced they cause a significant performance impact and if it's on they will cause a flood of messages.
      • Re-ordered the choices in the switch and added an option to say "If it is numeric already post it as is"
      • Added an option to only set the number format if doesn't match the default for the sheet.
  • Export-Excel Pester Tests
    • I have converted examples 1-9, 11 and 13 from Export-Excel help into tests and have added some additional tests, and extra parameters to the example command to get better test coverage. The test so far has 184 "should" conditions grouped as 58 "IT" statements; but is still a work in progress.
  • Compare-Worksheet pester tests
  • James O'Neill added Compare-Worksheet
    • Compares two worksheets with the same name in different files.


Thanks to the community yet again

  • ili101 for fixes and features
    • Removed [PSPlot] as OutputType. Fixes it throwing an error
  • Nasir Zubair added ConvertEmptyStringsToNull to the function ConvertFrom-ExcelToSQLInsert
    • If specified, cells without any data are replaced with NULL, instead of an empty string. This is to address behaviors in certain DBMS where an empty string is insert as 0 for INT column, instead of a NULL value.


-New parameter -ReZip. It ReZips the xlsx so it can be imported to PowerBI

Thanks to Justin Grote for finding and fixing the error that Excel files created do not import to PowerBI online. Plus, thank you to CrashM for confirming the fix.

Super helpful!


  • Updated Set-Format
    • Added parameters to set borders for cells, including top, bottom, left and right
    • Added parameters to set value and formula
$data = @"
Atlanta,New York,3602000,.0809,955000,.09,245,65
New York,Washington,4674000,.105,336000,.03,222,16
Chicago,New York,4674000,.0804,1536000,.14,550,43
New York,Philadelphia,12180000,.1427,-716000,-.07,321,-25
New York,San Francisco,3221000,.0629,1088000,.04,436,21
New York,Phoneix,2782000,.0723,467000,.10,674,33

  • Added -PivotFilter parameter, allows you to set up a filter so you can drill down into a subset of the overall dataset.
$data =@"


Thank you to James O'Neill, fixed bugs with ChangeDatabase parameter which would prevent it working

Added -Force to New-Alias

Add example to set the background color of a column

Supports excluding Row Grand Totals for PivotTables

Allow xlsm files to be read

Fix Set-Column.ps1, Set-Row.ps1, SetFormat.ps1, formatting.ps1 $false and $BorderRound


Added switch [Switch]$NoTotalsInPivot. Allows hiding of the row totals in the pivot table.

Thanks you to jameseholt for the request.

    get-process | where Company | select Company, Handles, WorkingSet |
        export-excel C:\temp\testColumnGrand.xlsx `
            -Show -ClearSheet  -KillExcel `
            -IncludePivotTable -PivotRows Company -PivotData @{"Handles"="average"} -NoTotalsInPivot
  • Fixed when using certain a ChartType for the Pivot Table Chart, would throw an error
  • Fixed - when you specify a file, and the directory does not exit, it now creates it


More great additions and thanks to James O'Neill

  • Added Convert-XlRangeToImage Gets the specified part of an Excel file and exports it as an image
  • Fixed a typo in the message at line 373.
  • Now catch an attempt to both clear the sheet and append to it.
  • Fixed some issues when appending to sheets where the header isn't in row 1 or the data doesn't start in column 1.
  • Added support for more settings when creating a pivot chart.
  • Corrected a typo PivotTableName was PivtoTableName in definition of New-PivotTableDefinition
  • Add-ConditionalFormat and Set-Format added to the parameters so each has the choice of working more like the other.
  • Added Set-Row and Set-Column - fill a formula down or across.
  • Added Send-SQLDataToExcel. Insert a rowset and then call Export-Excel for ranges, charts, pivots etc.


Huge thanks to James O'Neill. PowerShell aficionado. He always brings a flare when working with PowerShell. This is no exception.

(Check out the examples help Export-Excel -Examples)

  • New parameter Package allows an ExcelPackage object returned by -passThru to be passed in
  • New parameter ExcludeProperty to remove unwanted properties without needing to go through select-object
  • New parameter Append code to read the existing headers and move the insertion point below the current data
  • New parameter ClearSheet which removes the worksheet and any past data
  • Remove any existing Pivot table before trying to [re]create it
  • Check for inserting a pivot table so if -InsertPivotChart is specified it implies -InsertPivotTable

(Check out the examples help Export-Excel -Examples)

  • New function Export-Charts (requires Excel to be installed) - Export Excel charts out as JPG files
  • New function Add-ConditionalFormatting Adds conditional formatting to worksheet
  • New function Set-Format Applies Number, font, alignment and color formatting to a range of Excel Cells
  • ColorCompletion an argument completer for Colors for params across functions

I also worked out the parameters so you can do this, which is the same as passing -Now. It creates an Excel file name for you, does an auto fit and sets up filters.

ps | select Company, Handles | Export-Excel


Added New-PivotTableDefinition. You can create and wire up a PivotTable to a WorkSheet. You can also create as many PivotTable Worksheets to point a one Worksheet. Or, you create many Worksheets and many corresponding PivotTable Worksheets.

Here you can create a WorkSheet with the data from Get-Service. Then create four PivotTables, pointing to the data each pivoting on a different dimension and showing a different chart

$base = @{
    SourceWorkSheet   = 'gsv'
    PivotData         = @{'Status' = 'count'}
    IncludePivotChart = $true

$ptd = [ordered]@{}

$ptd += New-PivotTableDefinition @base servicetype -PivotRows servicetype -ChartType Area3D
$ptd += New-PivotTableDefinition @base status -PivotRows status -ChartType PieExploded3D
$ptd += New-PivotTableDefinition @base starttype -PivotRows starttype -ChartType BarClustered3D
$ptd += New-PivotTableDefinition @base canstop -PivotRows canstop -ChartType ConeColStacked

Get-Service | Export-Excel -path $file -WorkSheetname gsv -Show -PivotTableDefinition $ptd


Thanks to https://github.com/ili101 :

  • Fix Bug, Unable to find type [PSPlot]
  • Fix Bug, AutoFilter with TableName create corrupted Excel file.


Thanks to Jeremy Brun Fixed issues related to use of -Title parameter combined with column formatting parameters.

9/28/2017 (Version 4.0.1)

Added a new parameter called Password to import password protected files

Added even more Pester tests for a more robust and bug free module

Renamed parameter 'TopRow' to 'StartRow'

This allows us to be more concise when new parameters ('StartColumn', ..) will be added in the future Your code will not break after the update, because we added an alias for backward compatibility

Special thanks to robinmalik for providing us with the code to implement this new feature. A high five to DarkLite1 for the implementation.

9/12/2017 (Version 4.0.0)

Super thanks and hat tip to DarkLite1. There is now a new and improved Import-Excel, not only in functionality, but also improved readability, examples and more. Not only that, he's been running it in production in his company for a number of weeks!

Added Update-FirstObjectProperties Updates the first object to contain all the properties of the object with the most properties in the array. Check out the help.

Breaking Changes: Due to a big portion of the code that is rewritten some slightly different behavior can be expected from the Import-Excel function. This is especially true for importing empty Excel files with or without using the TopRow parameter. To make sure that your code is still valid, please check the examples in the help or the accompanying Pester test file.

Moving forward, we are planning to include automatic testing with the help of Pester, Appveyor and Travis. From now on any changes in the module will have to be accompanied by the corresponding Pester tests to avoid breakages of code and functionality. This is in preparation for new features coming down the road.


Thanks to Mikkel Nordberg. He contributed a ConvertTo-ExcelXlsx. To use it, Excel needs to be installed. The function converts the older Excel file format ending in .xls to the new format ending in .xlsx.


Huge thank you to DarkLite1! Refactoring of code, adding help, adding features, fixing bugs. Specifically this long outstanding one:

Export-Excel: Numeric values not correct

It is fantastic to work with people like DarkLite1 in the community, to help make the module so much better. A hat to you.

Another shout out to Damian Reeves! His questions turn into great features. He asked if it was possible to import an Excel worksheet and transform the data into SQL INSERT statements. We can now answer that question with a big YES!

ConvertFrom-ExcelToSQLInsert People .\testSQLGen.xlsx
INSERT INTO People ('First', 'Last', 'The Zip') Values('John', 'Doe', '12345');
INSERT INTO People ('First', 'Last', 'The Zip') Values('Jim', 'Doe', '12345');
INSERT INTO People ('First', 'Last', 'The Zip') Values('Tom', 'Doe', '12345');
INSERT INTO People ('First', 'Last', 'The Zip') Values('Harry', 'Doe', '12345');
INSERT INTO People ('First', 'Last', 'The Zip') Values('Jane', 'Doe', '12345');

Bonus Points

Use the underlying ConvertFrom-ExcelData function and you can use a scriptblock to format the data however you want.

ConvertFrom-ExcelData .\testSQLGen.xlsx {
    param($propertyNames, $record)

    $reportRecord = @()
    foreach ($pn in $propertyNames) {
        $reportRecord += "{0}: {1}" -f $pn, $record.$pn
    $reportRecord +=""
    $reportRecord -join "`r`n"


First: John
Last: Doe
The Zip: 12345

First: Jim
Last: Doe
The Zip: 12345

First: Tom
Last: Doe
The Zip: 12345

First: Harry
Last: Doe
The Zip: 12345

First: Jane
Last: Doe
The Zip: 12345


Thank you to DarkLite1 for more updates

  • TableName with parameter validation, throws an error when the TableName:
    • Starts with something else then a letter
    • Is NULL or empty
    • Contains spaces
  • Numeric parsing now uses CurrentInfo to use the system settings


Big thanks to DarkLite1 for some great updates

-DataOnly switch added to Import-Excel. When used it will only generate objects for rows that contain text values, not for empty rows or columns.

Get-ExcelWorkBookInfo - retrieves information of an Excel workbook.

      Get-ExcelWorkbookInfo .\Test.xlsx

      CorePropertiesXml     : #document
      Title                 :
      Subject               :
      Author                : Konica Minolta User
      Comments              :
      Keywords              :
      LastModifiedBy        : Bond, James (London) GBR
      LastPrinted           : 2017-01-21T12:36:11Z
      Created               : 17/01/2017 13:51:32
      Category              :
      Status                :
      ExtendedPropertiesXml : #document
      Application           : Microsoft Excel
      HyperlinkBase         :
      AppVersion            : 14.0300
      Company               : Secret Service
      Manager               :
      Modified              : 10/02/2017 12:45:37
      CustomPropertiesXml   : #document


  • Added -Now switch. This short cuts the process, automatically creating a temp file and enables the -Show, -AutoFilter, -AutoSize switches.
Get-Process | Select Company, Handles | Export-Excel -Now
  • Added ScriptBlocks for coloring cells. Check out Examples
Get-Process |
    Select-Object Company,Handles,PM, NPM|
    Export-Excel $xlfile -Show  -AutoSize -CellStyleSB {

        Set-CellStyle $workSheet 1 $LastColumn Solid Cyan

        foreach($row in (2..$totalRows | Where-Object {$_ % 2 -eq 0})) {
            Set-CellStyle $workSheet $row $LastColumn Solid Gray

        foreach($row in (2..$totalRows | Where-Object {$_ % 2 -eq 1})) {
            Set-CellStyle $workSheet $row $LastColumn Solid LightGray


Fixed PowerShell 3.0 compatibility. Thanks to headsphere. He used $obj.PSObject.Methods[$target] syntax to make it backward compatible. PS v4.0 and later allow $obj.$target.

Thank you to xelsirko for fixing - Import-module importexcel gives version warning if started inside background job


Fixed reading the headers from cells, moved from using Text property to Value property.


  • Added Copy-ExcelWorksheet. Let's you copy a work sheet from one Excel workbook to another.


  • Fixes Import-Excel #68


Attila Mihalicz fixed two issues

  • Removing extra spaces after the backtick
  • Uninitialized variable $idx leaks into the pipeline when -TableName parameter is used

Thanks Attila.


  • Pushed 2.2.7 fixed resolve path in Get-ExcelSheetInfo
  • Fixed Casting Error in Export-Excel
  • For Import-Excel change Resolve-Path to return ProviderPath for use with UNC


  • Added -UseDefaultCredentials to both Import-Html and Get-HtmlTable
  • New functions, Import-UPS and Import-USPS. Pass in a valid tracking # and it scrapes the page for the delivery details


Huge thank you to Willie Möller

  • He added a version check so the PowerShell Classes don't cause issues for down-level version of PowerShell
  • He also contributed the first Pester tests for the module. Super! Check them out, they'll be the way tests will be implemented going forward


Thanks to Paul Williams for this feature. Now data can be transposed to columns for better charting.

$file = "C:\Temp\ps.xlsx"
rm $file -ErrorAction Ignore

ps |
    where company |
    select Company,PagedMemorySize,PeakPagedMemorySize |
    Export-Excel $file -Show -AutoSize `
        -IncludePivotTable `
        -IncludePivotChart `
        -ChartType ColumnClustered `
        -PivotRows Company `
        -PivotData @{PagedMemorySize='sum';PeakPagedMemorySize='sum'}

Add -PivotDataToColumn

$file = "C:\Temp\ps.xlsx"
rm $file -ErrorAction Ignore

ps |
    where company |
    select Company,PagedMemorySize,PeakPagedMemorySize |
    Export-Excel $file -Show -AutoSize `
        -IncludePivotTable `
        -IncludePivotChart `
        -ChartType ColumnClustered `
        -PivotRows Company `
        -PivotData @{PagedMemorySize='sum';PeakPagedMemorySize='sum'} `

And here is the new chart view


Made more methods fluent

$t=Get-Range 0 5 .2


    Plot($t,$t, $t,$t2, $t,$t3).
    Title("Hello World").


  • Thanks to redoz Multi Series Charts are now working

Also check out how you can create a table and then with Excel notation, index into the data for charting "Impressions[A]"

$data = @"
"@ | ConvertFrom-Csv

$c = New-ExcelChart -Title Impressions `
    -ChartType Line -Header "Something" `
    -XRange "Impressions[Date]" `
    -YRange @("Impressions[B]","Impressions[A]")

$data |
    Export-Excel temp.xlsx -AutoSize -TableName Impressions -Show -ExcelChartDefinition $c


  • Added NumberFormat parameter
$data |
    Export-Excel -Path $file -Show -NumberFormat '[Blue]$#,##0.00;[Red]-$#,##0.00'


  • Added Get-Range, New-Plot and Plot Cos example
  • Updated EPPlus DLL. Allows markers to be changed and colored
  • Handles and warns if auto name range names are also valid Excel ranges


  • Added Header and FirstDataRow for Import-Html


  • Added GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual to New-ConditionalText
echo 489 668 299 777 860 151 119 497 234 788 |
    Export-Excel c:\temp\test.xlsx -Show `
    -ConditionalText (New-ConditionalText -ConditionalType GreaterThan 525)



  • Added Conditional Text types of Equal and NotEqual
  • Phone #'s like '+33 011 234 34' will be now be handled correctly

Try PassThru

$file = "C:\Temp\passthru.xlsx"
rm $file -ErrorAction Ignore

$xlPkg = $(
    New-PSItem north 10
    New-PSItem east  20
    New-PSItem west  30
    New-PSItem south 40
) | Export-Excel $file -PassThru


$ws.Cells["A3"].Value = "Hello World"
$ws.Cells["B3"].Value = "Updating cells"
$ws.Cells["D1:D5"].Value = "Data"



Invoke-Item $file




  • Added Get-ExcelSheetInfo - Great contribution from Johan Åkerström check him out on GitHub and Twitter


  • Added NoLegend, Show-Category, ShowPercent for all charts including Pivot Charts
  • Updated PieChart, BarChart, ColumnChart and Line chart to work with the pipeline and added NoLegend, Show-Category, ShowPercent


These new features open the door for really sophisticated work sheet creation.

Stay tuned for a blog post and examples.

Quick List

  • StartRow, StartColumn for placing data anywhere in a sheet
  • New-ExcelChart - Add charts to a sheet, multiple series for a chart, locate the chart anywhere on the sheet
  • AutoNameRange, Use functions and/or calculations in a cell
  • Quick charting using PieChart, BarChart, ColumnChart and more


Big bug fix for version 3.0 PowerShell folks!

This technique fails in 3.0 and works in 4.0 and later.


Adding .invoke works in 3.0 and later.


A big thank you to DarkLite1 for adding the help to Export-Excel.

Added -HeaderRow parameter. Sometimes the heading does not start in Row 1.


Fixes Export-Excel generates corrupt Excel file


Import-Excel has a new parameter NoHeader. If data in the sheet does not have headers and you don't want to supply your own, Import-Excel will generate the property name.

Import-Excel now returns .Value rather than .Text


Merged ValidateSet for Encoding and Extension. Thank you Irwin Strachan.


Export-Excel can now handle data that is not an object

echo a b c 1 $true 2.1 1/1/2015 | Export-Excel c:\temp\test.xlsx -Show


dir -Name | Export-Excel c:\temp\test.xlsx -Show


Hide worksheets Got a great request from forensicsguy20012004 to hide worksheets. You create a few pivotables, generate charts and then pivot table worksheets don't need to be visible.

Export-Excel now has a -HideSheet parameter that takes and array of worksheet names and hides them.


Here, you create four worksheets named PM,Handles,Services and Files.

The last line creates the Files sheet and then hides the Handles,Services sheets.

$p = Get-Process

$p|select company, pm | Export-Excel $xlFile -WorkSheetname PM
$p|select company, handles| Export-Excel $xlFile -WorkSheetname Handles
Get-Service| Export-Excel $xlFile -WorkSheetname Services

dir -File | Export-Excel $xlFile -WorkSheetname Files -Show -HideSheet Handles, Services

Note There is a bug in EPPlus that does not let you hide the first worksheet created. Hopefully it'll resolved soon.


Added Conditional formatting. See TryConditional.ps1 as an example.

Or, check out the short "How To" video.




  • For -PivotRows you can pass a hashtable with the name of the property and the type of calculation. Sum, Average, Max, Min, Product, StdDev, StdDevp, Var, Varp
Get-Service |
    Export-Excel "c:\temp\test.xlsx" `
        -Show `
        -IncludePivotTable `
        -PivotRows status `
        -PivotData @{status='count'}

6/16/2015 (Thanks Justin)

  • Improvements to PivotTable overwriting
  • Added two parameters to Export-Excel
    • RangeName - Turns the data piped to Export-Excel into a named range.
    • TableName - Turns the data piped to Export-Excel into an excel table.


Get-Process|Export-Excel foo.xlsx -Verbose -IncludePivotTable -TableName "Processes" -Show
Get-Process|Export-Excel foo.xlsx -Verbose -IncludePivotTable -RangeName "Processes" -Show


  • Fixed null header problem


  • Added three parameters:
    • FreezeTopRow - Freezes the first row of the data
    • AutoFilter - Enables filtering for the data in the sheet
    • BoldTopRow - Bolds the top row of data, the column headers


Get-CimInstance win32_service |
    select state, accept*, start*, caption |
    Export-Excel test.xlsx -Show -BoldTopRow -AutoFilter -FreezeTopRow -AutoSize



  • Published to PowerShell Gallery. In PowerShell v5 use Find-Module importexcel then Find-Module importexcel | Install-Module


  • datetime properties were displaying as ints, now are formatted


  • Now you can create multiple Pivot tables in one pass
    • Thanks to pscookiemonster, he submitted a repro case to the EPPlus CodePlex project and got it fixed


$ps = ps

$ps |
    Export-Excel .\testExport.xlsx  -WorkSheetname memory `
        -IncludePivotTable -PivotRows Company -PivotData PM `
        -IncludePivotChart -ChartType PieExploded3D
$ps |
    Export-Excel .\testExport.xlsx  -WorkSheetname handles `
        -IncludePivotTable -PivotRows Company -PivotData Handles `
        -IncludePivotChart -ChartType PieExploded3D -Show



  • Included and embellished Claus Nielsen function to take all sheets in an Excel file workbook and create a text file for each ConvertFrom-ExcelSheet
  • Renamed Export-MultipleExcelSheets to ConvertFrom-ExcelSheet


  • You can add a title to the Excel "Report" Title, TitleFillPattern, TitleBold, TitleSize, TitleBackgroundColor
    • Thanks to Irwin Strachan for this and other great suggestions, testing and more


  • Renamed AutoFitColumns to AutoSize
  • Implemented Export-MultipleExcelSheets
  • Implemented -Password for a worksheet
  • Replaced -Force switch with -NoClobber switch
  • Added examples for Get-Help
  • If Pivot table is requested, that sheet becomes the tab selected


  • Implemented exporting data to named sheets via the -WorkSheetname parameter.

Examples - gsv | Export-Excel .\test.xlsx -WorkSheetname Services

dir -file | Export-Excel .\test.xlsx -WorkSheetname Files

ps | Export-Excel .\test.xlsx -WorkSheetname Processes -IncludePivotTable -Show -PivotRows Company -PivotData PM

Convert (All or Some) Excel Sheets to Text files

Reads each sheet in TestSheets.xlsx and outputs it to the data directory as the sheet name with the extension .txt

ConvertFrom-ExcelSheet .\TestSheets.xlsx .\data

Reads and outputs sheets like Sheet10 and Sheet20 form TestSheets.xlsx and outputs it to the data directory as the sheet name with the extension .txt

ConvertFrom-ExcelSheet .\TestSheets.xlsx .\data sheet?0

Example Adding a Title

You can set the pattern, size and of if the title is bold.

    Title = "Process Report as of $(Get-Date)"
    TitleFillPattern = "LightTrellis"
    TitleSize = 18
    TitleBold = $true

    Path  = "$pwd\testExport.xlsx"
    Show = $true
    AutoSize = $true

Get-Process |
    Where Company | Select Company, PM |
    Export-Excel @p


Example Export-MultipleExcelSheets


$p = Get-Process

$DataToGather = @{
    PM        = {$p|select company, pm}
    Handles   = {$p|select company, handles}
    Services  = {gsv}
    Files     = {dir -File}
    Albums    = {(Invoke-RestMethod http://www.dougfinke.com/PowerShellfordevelopers/albums.js)}

Export-MultipleExcelSheets -Show -AutoSize .\testExport.xlsx $DataToGather

NOTE If the sheet exists when using -WorkSheetname parameter, it will be deleted and then added with the new data.

Get-Process Exported to Excel

Total Physical Memory Grouped By Company


Importing data from an Excel spreadsheet


You can also find EPPLus on Nuget.

Known Issues

  • Using -IncludePivotTable, if that pivot table name exists, you'll get an error.
    • Investigating a solution
    • Workaround delete the Excel file first, then do the export

Author: dfinke
Source Code: https://github.com/dfinke/ImportExcel
License: Apache-2.0 License

#excel #powershell 

Documenter.jl: A Documentation Generator for Julia


A documentation generator for Julia.


The package can be installed with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add Documenter

Or, equivalently, via the Pkg API:

julia> import Pkg; Pkg.add("Documenter")


  • STABLEdocumentation of the most recently tagged version.
  • DEVELdocumentation of the in-development version.

Project Status

The package is tested against, and being developed for, Julia 1.6 and above on Linux, macOS, and Windows.

Questions and Contributions

Usage questions can be posted on the Julia Discourse forum under the documenter tag, in the #documentation channel of the Julia Slack and/or in the JuliaDocs Gitter chat room.

Contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems. The contributing page has a few guidelines that should be followed when opening pull requests and contributing code.

Related packages

There are several packages that extend Documenter in different ways. The JuliaDocs organization maintains:

Other third-party packages that can be combined with Documenter include:

Finally, there are also a few other packages in the Julia ecosystem that are similar to Documenter, but fill a slightly different niche:

Download Details:

Author: JuliaDocs 
Source Code: https://github.com/JuliaDocs/Documenter.jl 
License: MIT license

#julia #docs #document 

Kasey  Turcotte

Kasey Turcotte


Deep Dive Into Pandas DataFrame Join — pd.join()

A deep dive visual tutorial on how to join columns with other data frames in pandas

he join( ) function of the pandas’ library is used to join columns of another DataFrame. It can efficiently join columns with another DataFrame on index or on a key column. We can also join multiple DataFrame objects by passing a list. Let’s start by understanding its’ syntax and parameters. The companion materials for this tutorial can be found under our resources section.

Table of Content:

  1. Syntax
  2. Create DataFrames
  3. Understanding lsuffix and rsuffix parameters
  4. Joining DataFrames by Index Values
  5. Set index to join DataFrames
  6. Understanding the on parameter
  7. Joining multiple DataFrames
  8. Joining a Series with a DataFrame
  9. Understanding the “how” parameter
  10. Understanding the “sort” parameter
  11. Key Takeaways
  12. Resources
  13. References

#artificial-intelligence #deep dive into pandas dataframe join — pd.join() #pandas #pandas dataframe #pd.join() #dive

Smooks: Extensible Data integration Java Framework for Building XML


This is the Git source code repository for the Smooks project.




Apache Maven 3.2.x


git clone git://github.com/smooks/smooks.git

cd smooks

mvn clean install

NoteYou will need both Maven (version 3.2.x) and Git installed on your local machine.


You can also build from the Docker image:

Install Docker.

Run sudo docker build -t smooks github.com/smooks/smooks. This will create a Docker image named smooks that contains the correct build environment and a clone of this Git repo.

Run sudo docker run -i smooks mvn clean install to build the source code.

Getting Started

The easiest way to get started with Smooks is to download and try out the examples. The examples are the recommended base upon which to integrate Smooks into your application.


Smooks is an extensible Java framework for building XML and non-XML data (CSV, EDI, POJOs, etc…​) fragment-based applications. It can be used as a lightweight framework on which to hook your own processing logic for a wide range of data formats but, out-of-the-box, Smooks ships with features that can be used individually or seamlessly together:

Java Binding: Populate POJOs from a source (CSV, EDI, XML, POJOs, etc…​). Populated POJOs can either be the final result of a transformation, or serve as a bridge for further transformations like what is seen in template resources which generate textual results such as XML. Additionally, Smooks supports collections (maps and lists of typed data) that can be referenced from expression languages and templates.

Transformation: perform a wide range of data transformations and mappings. XML to XML, CSV to XML, EDI to XML, XML to EDI, XML to CSV, POJO to XML, POJO to EDI, POJO to CSV, etc…​

Templating: extensible template-driven transformations, with support for XSLT, FreeMarker, and StringTemplate.

Huge Message Processing: process huge messages (gigabytes!). Split, transform and route fragments to JMS, filesystem, database, and other destinations.

Fragment Enrichment: enrich fragments with data from a database or other data sources.

Complex Fragment Validation: rule-based fragment validation.

Fragment Persistence: read fragments from, and save fragments to, a database with either JDBC, persistence frameworks (like MyBatis, Hibernate, or any JPA compatible framework), or DAOs.

Combine: leverage Smooks’s transformation, routing and persistence functionality for Extract Transform Load (ETL) operations.

Validation: perform basic or complex validation on fragment content. This is more than simple type/value-range validation.

Why Smooks?

Smooks was conceived to perform fragment-based transformations on messages. Supporting fragment-based transformation opened up the possibility of mixing and matching different technologies within the context of a single transformation. This meant that one could leverage distinct technologies for transforming fragments, depending on the type of transformation required by the fragment in question.

In the process of evolving this fragment-based transformation solution, it dawned on us that we were establishing a fragment-based processing paradigm. Concretely, a framework was being built for targeting custom visitor logic at message fragments. A visitor does not need to be restricted to transformation. A visitor could be implemented to apply all sorts of operations on fragments, and therefore, the message as a whole.

Smooks supports a wide range of data structures - XML, EDI, JSON, CSV, POJOs (POJO to POJO!). A pluggable reader interface allows you to plug in a reader implementation for any data format.

Fragment-Based Processing

The primary design goal of Smooks is to provide a framework that isolates and processes fragments in structured data (XML and non-XML) using existing data processing technologies (such as XSLT, plain vanilla Java, Groovy script).

A visitor targets a fragment with the visitor’s resource selector value. The targeted fragment can take in as much or as little of the source stream as you like. A fragment is identified by the name of the node enclosing the fragment. You can target the whole stream using the node name of the root node as the selector or through the reserved #document selector.

NoteThe terms fragment and node denote different meanings. It is usually acceptable to use the terms interchangeably because the difference is subtle and, more often than not, irrelevant. A node may be the outer node of a fragment, excluding the child nodes. A fragment is the outer node and all its child nodes along with their character nodes (text, etc…​). When a visitor targets a node, it typically means that the visitor can only process the fragment’s outer node as opposed to the fragment as a whole, that is, the outer node and its child nodes

What’s new in Smooks 2?

Smooks 2 introduces the DFDL cartridge and revamps its EDI cartridge, while dropping support for Java 7 along with a few other notable breaking changes:

DFDL cartridge

DFDL is a specification for describing file formats in XML. The DFDL cartridge leverages Apache Daffodil to parse files and unparse XML. This opens up Smooks to a wide array of data formats like SWIFT, ISO8583, HL7, and many more.

Pipeline support

Compose any series of transformations on an event outside the main execution context before directing the pipeline output to the execution result stream or to other destinations

Complete overhaul of the EDI cartridge

Rewritten to extend the DFDL cartridge and provide much better support for reading EDI documents

Added functionality to serialize EDI documents

As in previous Smooks versions, incorporated special support for EDIFACT

SAX NG filter

Replaces SAX filter and supersedes DOM filter

Brings with it a new visitor API which unifies the SAX and DOM visitor APIs

Cartridges migrated to SAX NG

Supports XSLT and StringTemplate resources unlike the legacy SAX filter

Mementos: a convenient way to stash and un-stash a visitor’s state during its execution lifecycle

Independent release cycles for all cartridges and one Maven BOM (bill of materials) to track them all

License change

After reaching consensus among our code contributors, we’ve dual-licensed Smooks under LGPL v3.0 and Apache License 2.0. This license change keeps Smooks open source while adopting a permissive stance to modifications.

New Smooks XSD schema (xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd")

Uniform XML namespace declarations: dropped default-selector-namespace and selector-namespace XML attributes in favour of declaring namespaces within the standard xmlns attribute from the smooks-resource-config element.

Removed default-selector attribute from smooks-resource-config element: selectors need to be set explicitly

Dropped Smooks-specific annotations in favour of JSR annotations

Farewell @ConfigParam, @Config, @AppContext, and @StreamResultWriter. Welcome @Inject.

Farewell @Initialize and @Uninitialize. Welcome @PostConstruct and @PreDestroy.

Separate top-level Java namespaces for API and implementation to provide a cleaner and more intuitive package structure: API interfaces and internal classes were relocated to org.smooks.api and org.smooks.engine respectively

Improved XPath support for resource selectors

Functions like not() are now supported

Numerous dependency updates

Maven coordinates change: we are now publishing Smooks artifacts under Maven group IDs prefixed with org.smooks

Replaced default SAX parser implementation from Apache Xerces to FasterXML’s Woodstox: benchmarks consistently showed Woodstox outperforming Xerces

Migrating from Smooks 1.7 to 2.0

  1. Smooks 2 no longer supports Java 7. Your application needs to be compiled to at least Java 8 to run Smooks 2.
  2. Replace references to Java packages org.milyn with org.smooks.api, org.smooks.engine, org.smooks.io or org.smooks.support.
  3. Inherit from org.smooks.api.resource.visitor.sax.ng.SaxNgVisitor instead of org.milyn.delivery.sax.SAXVisitor.
  4. Change legacy document root fragment selectors from $document to #document.
  5. Replace Smooks Maven coordinates to match the coordinates as described in the Maven guide.
  6. Replace ExecutionContext#isDefaultSerializationOn() method calls with ExecutionContext#getContentDeliveryRuntime().getDeliveryConfig().isDefaultSerializationOn().
  7. Replace ExecutionContext#getContext() method calls with`ExecutionContext#getApplicationContext()`.
  8. Replace org.smooks.delivery.dom.serialize.SerializationVisitor references with org.smooks.api.resource.visitor.SerializerVisitor.
  9. Replace org.smooks.cdr.annotation.AppContext annotations with javax.inject.Inject annotations.
  10. Replace org.smooks.cdr.annotation.ConfigParam annotations with javax.inject.Inject annotations:
  11. Substitute the @ConfigParam name attribute with the @javax.inject.Named annotation.
  12. Wrap java.util.Optional around the field to mimic the behaviour of the @ConfigParam optional attribute.
  13. Replace org.smooks.delivery.annotation.Initialize annotations with javax.annotation.PostConstruct annotations.
  14. Replace org.smooks.delivery.annotation.Uninitialize annotations with javax.annotation.PreDestroy annotations.
  15. Replace references to org.smooks.javabean.DataDecode with org.smooks.api.converter.TypeConverterFactory.
  16. Replace references to org.smooks.cdr.annotation.Configurator with org.smooks.api.lifecycle.LifecycleManager.
  17. Replace references to org.smooks.javabean.DataDecoderException with org.smooks.api.converter.TypeConverterException.
  18. Replace references to org.smooks.cdr.SmooksResourceConfigurationStore with org.smooks.api.Registry.
  19. Replace references to org.milyn.cdr.SmooksResourceConfiguration with org.smooks.api.resource.config.ResourceConfig.
  20. Replace references to org.milyn.delivery.sax.SAXToXMLWriter with org.smooks.io.DomSerializer.


See the FAQ.


See the Maven guide for details on how to integrate Smooks into your project via Maven.


A commonly accepted definition of Smooks is of it being a "Transformation Engine". Nonetheless, at its core, Smooks makes no reference to data transformation. The core codebase is designed to hook visitor logic into an event stream produced from a source of some kind. As such, in its most distilled form, Smooks is a Structured Data Event Stream Processor.

An application of a structured data event processor is transformation. In implementation terms, a Smooks transformation solution is a visitor reading the event stream from a source to produce a different representation of the input. However, Smooks’s core capabilities enable much more than transformation. A range of other solutions can be implemented based on the fragment-based processing model:

Java Binding: population of a POJO from the source.

Splitting & Routing: perform complex splitting and routing operations on the source stream, including routing data in different formats (XML, EDI, CSV, POJO, etc…​) to multiple destinations concurrently.

Huge Message Processing: declaratively consume (transform, or split and route) huge messages without writing boilerplate code.

Basic Processing Model

Smooks’s fundamental behaviour is to take an input source, such as XML, and from it generate an event stream to which visitors are applied to produce a result such as EDI.

Several sources and result types are supported which equate to different transformation types, including but not limited to:

  • XML to XML
  • XML to POJO
  • POJO to XML
  • POJO to POJO
  • EDI to XML
  • EDI to POJO
  • POJO to EDI
  • CSV to XML
  • CSV to …​
  • …​ to …​

Smooks maps the source to the result with the help of a highly-tunable SAX event model. The hierarchical events generated from an XML source (startElement, endElement, etc…​) drive the SAX event model though the event model can be just as easily applied to other structured data sources (EDI, CSV, POJO, etc…​). The most important events are typically the before and after visit events. The following illustration conveys the hierarchical nature of these events.


Hello World App

One or more of SaxNgVisitor interfaces need to be implemented in order to consume the SAX event stream produced from the source, depending on which events are of interest.

The following is a hello world app demonstrating how to implement a visitor that is fired on the visitBefore and visitAfter events of a targeted node in the event stream. In this case, Smooks configures the visitor to target element foo:


The visitor implementation is straightforward: one method implementation per event. As shown above, a Smooks config (more about resource-config later on) is written to target the visitor at a node’s visitBefore and visitAfter events.

The Java code executing the hello world app is a two-liner:

Smooks smooks = new Smooks("/smooks/echo-example.xml");
smooks.filterSource(new StreamSource(inputStream));

Observe that in this case the program does not produce a result. The program does not even interact with the filtering process in any way because it does not provide an ExecutionContext to smooks.filterSource(...).

This example illustrated the lower level mechanics of the Smooks’s programming model. In reality, most users are not going to want to solve their problems at this level of detail. Smooks ships with substantial pre-built functionality, that is, pre-built visitors. Visitors are bundled based on functionality: these bundles are called Cartridges.

Smooks Resources

A Smooks execution consumes an source of one form or another (XML, EDI, POJO, JSON, CSV, etc…​), and from it, generates an event stream that fires different visitors (Java, Groovy, DFDL, XSLT, etc…​). The goal of this process can be to produce a new result stream in a different format (data transformation), bind data from the source to POJOs and produce a populated Java object graph (Java binding), produce many fragments (splitting), and so on.

At its core, Smooks views visitors and other abstractions as resources. A resource is applied when a selector matches a node in the event stream. The generality of such a processing model can be daunting from a usability perspective because resources are not tied to a particular domain. To counteract this, Smooks 1.1 introduced an Extensible Configuration Model feature that allows specific resource types to be specified in the configuration using dedicated XSD namespaces of their own. Instead of having a generic resource config such as:

<resource-config selector="order-item">
    <resource type="ftl"><!-- <item>

an Extensible Configuration Model allows us to have a domain-specific resource config:

<ftl:freemarker applyOnElement="order-item">
    <ftl:template><!-- <item>

When comparing the above snippets, the latter resource has:

A more strongly typed domain specific configuration and so is easier to read,

Auto-completion support from the user’s IDE because the Smooks 1.1+ configurations are XSD-based, and

No need set the resource type in its configuration.


Central to how Smooks works is the concept of a visitor. A visitor is a Java class performing a specific task on the targeted fragment such as applying an XSLT script, binding fragment data to a POJO, validate fragments, etc…​


Resource selectors are another central concept in Smooks. A selector chooses the node/s a visitor should visit, as well working as a simple opaque lookup value for non-visitor logic.

When the resource is a visitor, Smooks will interpret the selector as an XPath-like expression. There are a number of things to be aware of:

The order in which the XPath expression is applied is the reverse of a normal order, like what hapens in an XSLT script. Smooks inspects backwards from the targeted fragment node, as opposed to forwards from the root node.

Not all of the XPath specification is supported. A selector supports the following XPath syntax:

text() and attribute value selectors: a/b[text() = 'abc'], a/b[text() = 123], a/b[@id = 'abc'], a/b[@id = 123].

text() is only supported on the last selector step in an expression: a/b[text() = 'abc'] is legal while a/b[text() = 'abc']/c is illegal.

text() is only supported on visitor implementations that implement the AfterVisitor interface only. If the visitor implements the BeforeVisitor or ChildrenVisitor interfaces, an error will result.

or & and logical operations: a/b[text() = 'abc' and @id = 123], a/b[text() = 'abc' or @id = 123]

Namespaces on both the elements and attributes: a:order/b:address[@b:city = 'NY'].

NoteThis requires the namespace prefix-to-URI mappings to be defined. A configuration error will result if not defined. Read the namespace declaration section for more details.

Supports = (equals), != (not equals), < (less than), > (greater than).

Index selectors: a/b[3].

Namespace Declaration

The xmlns attribute is used to bind a selector prefix to a namespace:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:c="http://c" xmlns:d="http://d">

    <resource-config selector="c:item[@c:code = '8655']/d:units[text() = 1]">


Alternatively, namespace prefix-to-URI mappings can be declared using the legacy core config namespace element:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

        <core:namespace prefix="c" uri="http://c"/>
        <core:namespace prefix="d" uri="http://d"/>

    <resource-config selector="c:item[@c:code = '8655']/d:units[text() = 1]">



Smooks relies on a Reader for ingesting a source and generating a SAX event stream. A reader is any class extending XMLReader. By default, Smooks uses the XMLReader returned from XMLReaderFactory.createXMLReader(). You can easily implement your own XMLReader to create a non-XML reader that generates the source event stream for Smooks to process:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <reader class="com.acme.ZZZZReader" />

        Other Smooks resources, e.g. <jb:bean> configs for
        binding data from the ZZZZ data stream into POJOs....


The reader config element is referencing a user-defined XMLReader. It can be configured with a set of handlers, features and parameters:

<reader class="com.acme.ZZZZReader">
        <handler class="com.X" />
        <handler class="com.Y" />
        <setOn feature="http://a" />
        <setOn feature="http://b" />
        <setOff feature="http://c" />
        <setOff feature="http://d" />
        <param name="param1">val1</param>
        <param name="param2">val2</param>

Packaged Smooks modules, known as cartridges, provide support for non-XML readers but, by default, Smooks expects an XML source. Omit the class name from the reader element to set features on the default XML reader:

        <setOn feature="http://a" />
        <setOn feature="http://b" />
        <setOff feature="http://c" />
        <setOff feature="http://d" />


Smooks can present output to the outside world in two ways:

As instances of Result: client code extracts output from the Result instance after passing an empty one to Smooks#filterSource(...).

As side effects: during filtering, resource output is sent to web services, local storage, queues, data stores, and other locations. Events trigger the routing of fragments to external endpoints such as what happens when splitting and routing.

Unless configured otherwise, a Smooks execution does not accumulate the input data to produce all the outputs. The reason is simple: performance! Consider a document consisting of hundreds of thousands (or millions) of orders that need to be split up and routed to different systems in different formats, based on different conditions. The only way of handing documents of these magnitudes is by streaming them.

ImportantSmooks can generate output in either, or both, of the above ways, all in a single filtering pass of the source. It does not need to filter the source multiple times in order to generate multiple outputs, critical for performance.


A look at the Smooks API reveals that Smooks can be supplied with multiple Result instances:

public void filterSource(Source source, Result... results) throws SmooksException

Smooks can work with the standard JDK StreamResult and DOMResult result types, as well as the Smooks specific ones:

JavaResult: result type for capturing the contents of the Smooks JavaBean context.

StringResult: StreamResult extension wrapping a StringWriter, useful for testing.

ImportantAs yet, Smooks does not support capturing output to multiple Result instances of the same type. For example, you can specify multiple StreamResult instances in Smooks.filterSource(...) but Smooks will only output to the first StreamResult instance.

Stream Results

The StreamResult and DOMResult types receive special attention from Smooks. When the default.serialization.on global parameter is turned on, which by default it is, Smooks serializes the stream of events to XML while filtering the source. The XML is fed to the Result instance if a StreamResult or DOMResult is passed to Smooks#filterSource.

NoteThis is the mechanism used to perform a standard 1-input/1-xml-output character-based transformation.

Side Effects

Smooks is also able to generate different types of output during filtering, that is, while filtering the source event stream but before it reaches the end of the stream. A classic example of this output type is when it is used to split and route fragments to different endpoints for processing by other processes.


A pipeline is a flexible, yet simple, Smooks construct that isolates the processing of a targeted event from its main processing as well as from the processing of other pipelines. In practice, this means being able to compose any series of transformations on an event outside the main execution context before directing the pipeline output to the execution result stream or to other destinations. With pipelines, you can enrich data, rename/remove nodes, and much more.

Under the hood, a pipeline is just another instance of Smooks, made self-evident from the Smooks config element declaring a pipeline:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

   <core:smooks filterSourceOn="...">


core:smooks fires a nested Smooks execution whenever an event in the stream matches the filterSourceOn selector. The pipeline within the inner smooks-resource-list element visits the selected event and its child events. It is worth highlighting that the inner smooks-resource-list element behaves identically to the outer one, and therefore, it accepts resources like visitors, readers, and even pipelines (a pipeline within a pipeline!). Moreover, a pipeline is transparent to its nested resources: a resource’s behaviour remains the same whether it’s declared inside a pipeline or outside it.

The optional core:action element tells the nested Smooks instance what to do with the pipeline’s output. The next sections list the supported actions.


Merges the pipeline’s output with the result stream:


As described in the subsequent sections, an inline action replaces, prepends, or appends content.


Substitutes the selected fragment with the pipeline output:


Prepend Before

Adds the output before the selector start tag:


Prepend After

Adds the output after the selector start tag:


Append Before

Adds the output before the selector end tag:


Append After

Adds the output after the selector end tag:


Bind To

Binds the output to the execution context’s bean store:

    <core:bind-to id="..."/>

Output To

Directs the output to a different stream other than the result stream:

    <core:output-to outputStreamResource="..."/>


The basic functionality of Smooks can be extended through the development of a Smooks cartridge. A cartridge is a Java archive (JAR) containing reusable resources (also known as Content Handlers). A cartridge augments Smooks with support for a specific type input source or event handling.

Visit the GitHub organisation page for the complete list of Smooks cartridges.


A Smooks filter delivers generated events from a reader to the application’s resources. Smooks 1 had the DOM and SAX filters. The DOM filter was simple to use but kept all the events in memory while the SAX filter, though more complex, delivered the events in streaming fashion. Having two filter types meant two different visitor APIs and execution paths, with all the baggage it entailed.

Smooks 2 unifies the legacy DOM and SAX filters without sacrificing convenience or performance. The new SAX NG filter drops the API distinction between DOM and SAX. Instead, the filter streams SAX events as partial DOM elements to SAX NG visitors targeting the element. A SAX NG visitor can read the targeted node as well as any of the node’s ancestors but not the targeted node’s children or siblings in order to keep the memory footprint to a minimum.

The SAX NG filter can mimic DOM by setting its max.node.depth parameter to 0 (default value is 1), allowing each visitor to process the complete DOM tree in its visitAfter(...) method:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

        <param name="max.node.depth">0</param>

A max.node.depth value of greater than 1 will tell the filter to read and keep an node’s descendants up to the desired depth. Take the following input as an example:

<order id="332">
        <customer number="123">Joe</customer>
        <order-item id="1">
        <order-item id="2">
        <order-item id="3">

Along with the config:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

        <param name="max.node.depth">2</param>

    <resource-config selector="order-item">


At any given time, there will always be a single order-item in memory containing product because max.node.depth is 2. Each new order-item overwrites the previous order-item to minimise the memory footprint. MyVisitor#visitAfter(...) is invoked 3 times, each invocation corresponding to an order-item fragment. The first invocation will process:

<order-item id='1'>

While the second invocation will process:

<order-item id='2'>

Whereas the last invocation will process:

<order-item id='3'>

Programmatically, implementing org.smooks.api.resource.visitor.sax.ng.ParameterizedVisitor will give you fine-grained control over the visitor’s targeted element depth:

public class DomVisitor implements ParameterizedVisitor {

    public void visitBefore(Element element, ExecutionContext executionContext) {

    public void visitAfter(Element element, ExecutionContext executionContext) {
        System.out.println("Element: " + XmlUtil.serialize(element, true));

    public int getMaxNodeDepth() {
        return Integer.MAX_VALUE;

ParameterizedVisitor#getMaxNodeDepth() returns an integer denoting the targeted element’s maximum tree depth the visitor can accept in its visitAfter(...) method.


Filter-specific knobs are set through the smooks-core configuration namespace (https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd) introduced in Smooks 1.3:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <core:filterSettings type="SAX NG" (1)
                         defaultSerialization="true" (2)
                         terminateOnException="true" (3)
                         closeSource="true" (4)
                         closeResult="true" (5)
                         rewriteEntities="true" (6)
                         readerPoolSize="3"/> (7)

    <!-- Other visitor configs etc... -->


type (default: SAX NG): the type of processing model that will be used. SAX NG is the recommended type. The DOM type is deprecated.

defaultSerialization (default: true): if default serialization should be switched on. Default serialization being turned on simply tells Smooks to locate a StreamResult (or DOMResult) in the Result objects provided to the Smooks.filterSource method and to serialize all events to that Result instance. This behavior can be turned off using this global configuration parameter and can be overridden on a per-fragment basis by targeting a visitor at that fragment that takes ownership of the org.smooks.io.FragmentWriter object.

terminateOnException (default: true): whether an exception should terminate execution.

closeSource (default: true): close Inp instance streams passed to the Smooks.filterSource method. The exception here is System.in, which will never be closed.

closeResult: close Result streams passed to the [Smooks.filterSource method (default "true"). The exception here is System.out and System.err, which will never be closed.

rewriteEntities: rewrite XML entities when reading and writing (default serialization) XML.

readerPoolSize: reader Pool Size (default 0). Some Reader implementations are very expensive to create (e.g. Xerces). Pooling Reader instances (i.e. reusing) can result in a huge performance improvement, especially when processing lots of "small" messages. The default value for this setting is 0 (i.e. unpooled - a new Reader instance is created for each message). Configure in line with your applications threading model.


Smooks streams events that can be captured, and inspected, while in-flight or after execution. HtmlReportGenerator is one such class that inspects in-flight events to go on and generate an HTML report from the execution:

Smooks smooks = new Smooks("/smooks/smooks-transform-x.xml");
ExecutionContext executionContext = smooks.createExecutionContext();

executionContext.getContentDeliveryRuntime().addExecutionEventListener(new HtmlReportGenerator("/tmp/smooks-report.html"));
smooks.filterSource(executionContext, new StreamSource(inputStream), new StreamResult(outputStream));

HtmlReportGenerator is a useful tool in the developer’s arsenal for diagnosing issues, or for comprehending a transformation.

An example HtmlReportGenerator report can be seen online here.

Of course you can also write and use your own ExecutionEventListener implementations.

CautionOnly use the HTMLReportGenerator in development. When enabled, the HTMLReportGenerator incurs a significant performance overhead and with large message, can even result in OutOfMemory exceptions.


You can terminate Smooks’s filtering before it reaches the end of a stream. The following config terminates filtering at the end of the customer fragment:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <!-- Visitors... -->
    <core:terminate onElement="customer"/>


The default behavior is to terminate at the end of the targeted fragment, on the visitAfter event. To terminate at the start of the targeted fragment, on the visitBefore event, set the terminateBefore attribute to true:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <!-- Visitors... -->
    <core:terminate onElement="customer" terminateBefore="true"/>


Bean Context

The Bean Context is a container for objects which can be accessed within during a Smooks execution. One bean context is created per execution context, that is, per Smooks#filterSource(...) operation. Provide an org.smooks.io.payload.JavaResult object to Smooks#filterSource(...) if you want the contents of the bean context to be returned at the end of the filtering process:

//Get the data to filter
StreamSource source = new StreamSource(getClass().getResourceAsStream("data.xml"));

//Create a Smooks instance (cachable)
Smooks smooks = new Smooks("smooks-config.xml");

//Create the JavaResult, which will contain the filter result after filtering
JavaResult result = new JavaResult();

//Filter the data from the source, putting the result into the JavaResult
smooks.filterSource(source, result);

//Getting the Order bean which was created by the JavaBean cartridge
Order order = (Order)result.getBean("order");

Resources like visitors access the bean context’s beans at runtime from the BeanContext. The BeanContext is retrieved from ExecutionContext#getBeanContext(). You should first retrieve a BeanId from the BeanIdStore when adding or retrieving objects from the BeanContext. A BeanId is a special key that ensures higher performance then String keys, however String keys are also supported. The BeanIdStore must be retrieved from ApplicationContext#getBeanIdStore(). A BeanId object can be created by calling BeanIdStore#register(String). If you know that the BeanId is already registered, then you can retrieve it by calling BeanIdStore#getBeanId(String). BeanId is scoped at the application context. You normally register it in the @PostConstruct annotated method of your visitor implementation and then reference it as member variable from the visitBefore and visitAfter methods.

NoteBeanId and BeanIdStore are thread-safe.

Pre-installed Beans

A number of pre-installed beans are available in the bean context at runtime:

PUUID: This UniqueId instance provides unique identifiers for the filtering ExecutionContext.

PTIME: This Time instance provides time-based data for the filtering ExecutionContext.

The following are examples of how each of these would be used in a FreeMarker template.

Unique ID of the ExecutionContext:


Random Unique ID:


Filtering start time in milliseconds:


Filtering start time in nanoseconds:


Filtering start date:


Current time in milliseconds:


Current time in nanoSeconds:


Current date:


Global Configurations

Global configuration settings are, as the name implies, configuration options that can be set once and be applied to all resources in a configuration.

Smooks supports two types of globals, default properties and global parameters:

Global Configuration Parameters: Every in a Smooks configuration can specify elements for configuration parameters. These parameter values are available at runtime through the ResourceConfig, or are reflectively injected through the @Inject annotation. Global Configuration Parameters are parameters that are defined centrally (see below) and are accessible to all runtime components via the ExecutionContext (vs ResourceConfig). More on this in the following sections.

Default Properties: Specify default values for attributes. These defaults are automatically applied to `ResourceConfig`s when their corresponding does not specify the attribute. More on this in the following section.

Global Configuration Parameters

Global properties differ from the default properties in that they are not specified on the root element and are not automatically applied to resources.

Global parameters are specified in a <params> element:

    <param name="xyz.param1">param1-val</param>

Global Configuration Parameters are accessible via the ExecutionContext e.g.:

public void visitAfter(Element element, ExecutionContext executionContext) {
    String param1 = executionContext.getConfigParameter("xyz.param1", "defaultValueABC");

Default Properties

Default properties are properties that can be set on the root element of a Smooks configuration and have them applied to all resource configurations in smooks-conf.xml file. For example, if you have a resource configuration file in which all the resource configurations have the same selector value, you could specify a default-target-profile=order to save specifying the profile on every resource configuration:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"




The following default configuration options are available:

default-target-profile*: Default target profile that will be applied to all resources in the smooks configuration file, where a target-profile is not defined.

default-condition-ref: Refers to a global condition by the conditions id. This condition is applied to resources that define an empty "condition" element (i.e. ) that does not reference a globally defined condition.

Configuration Modularization

Smooks configurations are easily modularized through use of the <import> element. This allows you to split Smooks configurations into multiple reusable configuration files and then compose the top level configurations using the <import> element e.g.

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <import file="bindings/order-binding.xml" />
    <import file="templates/order-template.xml" />


You can also inject replacement tokens into the imported configuration by using <param> sub-elements on the <import>. This allows you to make tweaks to the imported configuration.

<!-- Top level configuration... -->
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <import file="bindings/order-binding.xml">
        <param name="orderRootElement">order</param>

<!-- Imported parameterized bindings/order-binding.xml configuration... -->
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <jb:bean beanId="order" class="org.acme.Order" createOnElement="@orderRootElement@">


Note how the replacement token injection points are specified using @tokenname@.

Exporting Results

When using Smooks standalone you are in full control of the type of output that Smooks produces since you specify it by passing a certain Result to the filter method. But when integrating Smooks with other frameworks (JBossESB, Mule, Camel, and others) this needs to be specified inside the framework’s configuration. Starting with version 1.4 of Smooks you can now declare the data types that Smooks produces and you can use the Smooks api to retrieve the Result(s) that Smooks exports.

To declare the type of result that Smooks produces you use the 'exports' element as shown below:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd" xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd">
      <core:result type="org.smooks.io.payload.JavaResult"/>

The newly added exports element declares the results that are produced by this Smooks configuration. A exports element can contain one or more result elements. A framework that uses Smooks could then perform filtering like this:

// Get the Exported types that were configured.
Exports exports = Exports.getExports(smooks.getApplicationContext());
if (exports.hasExports())
    // Create the instances of the Result types.
    // (Only the types, i.e the Class type are declared in the 'type' attribute.
    Result[] results = exports.createResults();
    smooks.filterSource(executionContext, getSource(exchange), results);
    // The Results(s) will now be populate by Smooks filtering process and
    // available to the framework in question.

There might also be cases where you only want a portion of the result extracted and returned. You can use the ‘extract’ attribute to specify this:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
      <core:result type="org.smooks.io.payload.JavaResult" extract="orderBean"/>

The extract attribute is intended to be used when you are only interested in a sub-section of a produced result. In the example above we are saying that we only want the object named orderBean to be exported. The other contents of the JavaResult will be ignored. Another example where you might want to use this kind of extracting could be when you only want a ValidationResult of a certain type, for example to only return validation errors.

Below is an example of using the extracts option from an embedded framework:

// Get the Exported types that were configured.
Exports exports = Exports.getExports(smooks.getApplicationContext());
if (exports.hasExports())
    // Create the instances of the Result types.
    // (Only the types, i.e the Class type are declared in the 'type' attribute.
    Result[] results = exports.createResults();
    smooks.filterSource(executionContext, getSource(exchange), results);
    List<object> objects = Exports.extractResults(results, exports);
    // Now make the object available to the framework that this code is running:
    // Camel, JBossESB, Mule, etc...

Performance Tuning

Like with any Software, when configured or used incorrectly, performance can be one of the first things to suffer. Smooks is no different in this regard.


Cache and reuse the Smooks Object. Initialization of Smooks takes some time and therefore it is important that it is reused.

Pool reader instances where possible. This can result in a huge performance boost, as some readers are very expensive to create.

If possible, use SAX NG filtering. However, you need to check that all Smooks cartridges in use are SAX NG compatible. SAX NG processing is faster than DOM processing and has a consistently small memory footprint. It is especially recommended for processing large messages. See the Filtering Process Selection (DOM or SAX?) section. SAX NG is the default filter since Smooks 2.

Turn off debug logging. Smooks performs some intensive debug logging in parts of the code. This can result in significant additional processing overhead and lower throughput. Also remember that NOT having your logging configured (at all) may result in debug log statements being executed!!

Contextual selectors can obviously have a negative effect on performance e.g. evaluating a match for a selector like "a/b/c/d/e" will obviously require more processing than that of a selector like "d/e". Obviously there will be situations where your data model will require deep selectors, but where it does not, you should try to optimize them for the sake of performance.

Smooks Cartridges

Every cartridge can have its own performance optimization tips.

Javabean Cartridge

If possible don’t use the Virtual Bean Model. Create Beans instead of maps. Creating and adding data to Maps is a lot slower then creating simple POJO’s and calling the setter methods.


Unit Testing

Unit testing with Smooks is simple:

public class MyMessageTransformTest {
    public void test_transform() throws Exception {
        Smooks smooks = new Smooks(getClass().getResourceAsStream("smooks-config.xml"));

        try {
            Source source = new StreamSource(getClass().getResourceAsStream("input-message.xml" ) );
            StringResult result = new StringResult();

            smooks.filterSource(source, result);

            // compare the expected xml with the transformation result.
            XMLAssert.assertXMLEqual(new InputStreamReader(getClass().getResourceAsStream("expected.xml")), new StringReader(result.getResult()));
        } finally {

The test case above uses XMLUnit.

The following maven dependency was used for xmlunit in the above test:


Common use cases

Processing Huge Messages (GBs)

One of the main features introduced in Smooks v1.0 is the ability to process huge messages (Gbs in size). Smooks supports the following types of processing for huge messages:

One-to-One Transformation: This is the process of transforming a huge message from its source format (e.g. XML), to a huge message in a target format e.g. EDI, CSV, XML etc.

Splitting & Routing: Splitting of a huge message into smaller (more consumable) messages in any format (EDI, XML, Java, etc…​) and Routing of those smaller messages to a number of different destination types (filesystem, JMS, database).

Persistence: Persisting the components of the huge message to a database, from where they can be more easily queried and processed. Within Smooks, we consider this to be a form of Splitting and Routing (routing to a database).

All of the above is possible without writing any code (i.e. in a declarative manner). Typically, any of the above types of processing would have required writing quite a bit of ugly/unmaintainable code. It might also have been implemented as a multi-stage process where the huge message is split into smaller messages (stage #1) and then each smaller message is processed in turn to persist, route, etc…​ (stage #2). This would all be done in an effort to make that ugly/unmaintainable code a little more maintainable and reusable. With Smooks, most of these use-cases can be handled without writing any code. As well as that, they can also be handled in a single pass over the source message, splitting and routing in parallel (plus routing to multiple destinations of different types and in different formats).

NoteBe sure to read the section on Java Binding.

One-to-One Transformation

If the requirement is to process a huge message by transforming it into a single message of another format, the easiest mechanism with Smooks is to apply multiple FreeMarker templates to the Source message Event Stream, outputting to a Smooks.filterSource Result stream.

This can be done in one of 2 ways with FreeMarker templating, depending on the type of model that’s appropriate:

Using FreeMarker + NodeModels for the model.

Using FreeMarker + a Java Object model for the model. The model can be constructed from data in the message, using the Javabean Cartridge.

Option #1 above is obviously the option of choice, if the tradeoffs are OK for your use case. Please see the FreeMarker Templating docs for more details.

The following images shows an message, as well as the message to which we need to transform the message:


Imagine a situation where the message contains millions of elements. Processing a huge message in this way with Smooks and FreeMarker (using NodeModels) is quite straightforward. Because the message is huge, we need to identify multiple NodeModels in the message, such that the runtime memory footprint is as low as possible. We cannot process the message using a single model, as the full message is just too big to hold in memory. In the case of the message, there are 2 models, one for the main data (blue highlight) and one for the data (beige highlight):


So in this case, the most data that will be in memory at any one time is the main order data, plus one of the order-items. Because the NodeModels are nested, Smooks makes sure that the order data NodeModel never contains any of the data from the order-item NodeModels. Also, as Smooks filters the message, the order-item NodeModel will be overwritten for every order-item (i.e. they are not collected). See SAX NG.

Configuring Smooks to capture multiple NodeModels for use by the FreeMarker templates is just a matter of configuring the DomModelCreator visitor, targeting it at the root node of each of the models. Note again that Smooks also makes this available to SAX filtering (the key to processing huge message). The Smooks configuration for creating the NodeModels for this message are:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

        Create 2 NodeModels. One high level model for the "order"
        (header, etc...) and then one for the "order-item" elements...
    <resource-config selector="order,order-item">

    <!-- FreeMarker templating configs to be added below... -->

Now the FreeMarker templates need to be added. We need to apply 3 templates in total:

A template to output the order "header" details, up to but not including the order items.

A template for each of the order items, to generate the elements in the .

A template to close out the message.

With Smooks, we implement this by defining 2 FreeMarker templates. One to cover #1 and #3 (combined) above, and a seconds to cover the elements.

The first FreeMarker template is targeted at the element and looks as follows:

<ftl:freemarker applyOnElement="order-items">

You will notice the +<?TEMPLATE-SPLIT-PI?>+ processing instruction. This tells Smooks where to split the template, outputting the first part of the template at the start of the element, and the other part at the end of the element. The element template (the second template) will be output in between.

The second FreeMarker template is very straightforward. It simply outputs the elements at the end of every element in the source message:

    <ftl:freemarker applyOnElement="order-item">
        <ftl:template><!-- <item>

Because the second template fires on the end of the elements, it effectively generates output into the location of the <?TEMPLATE-SPLIT-PI?> Processing Instruction in the first template. Note that the second template could have also referenced data in the "order" NodeModel.

And that’s it! This is available as a runnable example in the Tutorials section.

This approach to performing a One-to-One Transformation of a huge message works simply because the only objects in memory at any one time are the order header details and the current details (in the Virtual Object Model).? Obviously it can’t work if the transformation is so obscure as to always require full access to all the data in the source message e.g. if the messages needs to have all the order items reversed in order (or sorted).? In such a case however, you do have the option of routing the order details and items to a database and then using the database’s storage, query and paging features to perform the transformation.

Splitting & Routing

Smooks supports a number of options when it comes to splitting and routing fragments. The ability to split the stream into fragments and route these fragments to different endpoints (File, JMS, etc…​) is a fundamental capability. Smooks improves this capability with the following features:

Basic Fragment Splitting: basic splitting means that no fragment transformation happens prior to routing. Basic splitting and routing involves defining the XPath of the fragment to be split out and defining a routing component (e.g., Apache Camel) to route that unmodified split fragment.

Complex Fragment Splitting: basic fragment splitting works for many use cases and is what most splitting and routing solutions offer. Smooks extends the basic splitting capabilities by allowing you to perform transformations on the split fragment data before routing is applied. For example, merging in the customer-details order information with each order-item information before performing the routing order-item split fragment routing.

In-Flight Stream Splitting & Routing (Huge Message Support): Smooks is able to process gigabyte streams because it can perform in-flight event routing; events are not accumulated when the max.node.depth parameter is left unset.

Multiple Splitting and Routing: conditionally split and route multiple fragments (different formats XML, EDI, POJOs, etc…​) to different endpoints in a single filtering pass of the source. One could route an OrderItem Java instance to the HighValueOrdersValidation JMS queue for order items with a value greater than $1,000 and route all order items as XML/JSON to an HTTP endpoint for logging.

Extending Smooks

All existing Smooks functionality (Java Binding, EDI processing, etc…​) is built through extension of a number of well defined APIs. We will look at these APIs in the coming sections.

The main extension points/APIs in Smooks are:

Reader APIs: Those for processing Source/Input data (Readers) so as to make it consumable by other Smooks components as a series of well defined hierarchical events (based on the SAX event model) for all of the message fragments and sub-fragments.

Visitor APIs: Those for consuming the message fragment SAX events produced by a source/input reader.

Another very important aspect of writing Smooks extensions is how these components are configured. Because this is common to all Smooks components, we will look at this first.

Configuring Smooks Components

All Smooks components are configured in exactly the same way. As far as the Smooks Core code is concerned, all Smooks components are "resources" and are configured via a ResourceConfig instance, which we talked about in earlier sections.

Smooks provides mechanisms for constructing namespace (XSD) specific XML configurations for components, but the most basic configuration (and the one that maps directly to the ResourceConfig class) is the basic XML configuration from the base configuration namespace (https://www.smooks.org/xsd/smooks-2.0.xsd).

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <resource-config selector="">
        <param name=""></param>



The selector attribute is the mechanism by which the resource is "selected" e.g. can be an XPath for a visitor. We’ll see more of this in the coming sections.

The resource element is the actual resource. This can be a Java Class name or some other form of resource (such as a template). For the purposes of this section however, lets just assume the resource to by a Java Class name.

The param elements are configuration parameters for the resource defined in the resource element.

Smooks takes care of all the details of creating the runtime representation of the resource (e.g. constructing the class named in the resource element) and injecting all the configuration parameters. It also works out what the resource type is, and from that, how to interpret things like the selector e.g., if the resource is a visitor instance, it knows the selector is an XPath, selecting a Source message fragment.

Configuration Annotations

After your component has been created, you need to configure it with the element details. This is done using the @Inject annotation.


The Inject annotation reflectively injects the named parameter (from the elements) having the same name as the annotated property itself (the name can actually be different, but by default, it matches against the name of the component property).

Suppose we have a component as follows:

public class DataSeeder {

    private File seedDataFile;

    public File getSeedDataFile() {
        return seedDataFile;

    // etc...

We configure this component in Smooks as follows:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <resource-config selector="dataSeeder">
        <param name="seedDataFile">./seedData.xml</param>


This annotation eliminates a lot of noisy code from your component because it:

Handles decoding of the value before setting it on the annotated component property. Smooks provides type converters for all the main types (Integer, Double, File, Enums, etc…​), but you can implement and use a custom TypeConverter where the out-of-the-box converters don’t cover specific decoding requirements. Smooks will automatically use your custom converter if it is registered. See the TypeConverter Javadocs for details on registering a TypeConverter implementation such that Smooks will automatically locate it for converting a specific data type.

Supports enum constraints for the injected property, generating a configuration exception where the configured value is not one of the defined choice values. For example, you may have a property which has a constrained value set of "ON" and "OFF". You can use an enum for the property type to constrain the value, raise exceptions, etc…​:

private OnOffEnum foo;

Can specify default property values:

private Boolean foo = true;

Can specify whether the property is optional:

private java.util.Optional<Boolean> foo;

By default, all properties are required but setting a default implicitly marks the property as being optional.

@PostConstruct and @PreDestroy

The Inject annotation is great for configuring your component with simple values, but sometimes your component needs more involved configuration for which we need to write some "initialization" code. For this, Smooks provides @PostConstruct.

On the other side of this, there are times when we need to undo work performed during initialization when the associated Smooks instance is being discarded (garbage collected) e.g. to release some resources acquired during initialization, etc…​ For this, Smooks provides the @PreDestroy.

The basic initialization/un-initialization sequence can be described as follows:

smooks = new Smooks(..);

    // Initialize all annotated components

        // Use the smooks instance through a series of filterSource invocations...
        ... etc ...


    // Uninitialize all annotated components

In the following example, lets assume we have a component that opens multiple connections to a database on initialization and then needs to release all those database resources when we close the Smooks instance.

public class MultiDataSourceAccessor {

    private File dataSourceConfig;

    Map<String, Datasource> datasources = new HashMap<String, Datasource>();

    public void createDataSources() {
        // Add DS creation code here....
        // Read the dataSourceConfig property to read the DS configs...

    public void releaseDataSources() {
        // Add DS release code here....

    // etc...


@PostConstruct and @PreDestroy methods must be public, zero-arg methods.

@Inject properties are all initialized before the first @PostConstruct method is called. Therefore, you can use @Inject component properties as input to the initialization process.

@PreDestroy methods are all called in response to a call to the Smooks.close method.

Defining Custom Configuration Namespaces

Smooks supports a mechanism for defining custom configuration namespaces for components. This allows you to support custom, XSD based (validatable), configurations for your components Vs treating them all as vanilla Smooks resources via the base configuration.

The basic process involves:

Writing an configuration XSD for your component that extends the base https://www.smooks.org/xsd/smooks-2.0.xsd configuration namespace. This XSD must be supplied on the classpath with your component. It must be located in the /META-INF folder and have the same path as the namespace URI. For example, if your extended namespace URI is http://www.acme.com/schemas/smooks/acme-core-1.0.xsd, then the physical XSD file must be supplied on the classpath in "/META-INF/schemas/smooks/acme-core-1.0.xsd".

Writing a Smooks configuration namespace mapping configuration file that maps the custom namespace configuration into a ResourceConfig instance. This file must be named (by convention) based on the name of the namespace it is mapping and must be physically located on the classpath in the same folder as the XSD. Extending the above example, the Smooks mapping file would be "/META-INF/schemas/smooks/acme-core-1.0.xsd-smooks.xml". Note the "-smooks.xml" postfix.

The easiest way to get familiar with this mechanism is by looking at existing extended namespace configurations within the Smooks code itself. All Smooks components (including e.g. the Java Binding functionality) use this mechanism for defining their configurations. Smooks Core itself defines a number of extended configuration namesaces, as can be seen in the source.

Implementing a Source Reader

Implementing and configuring a new Source Reader for Smooks is straightforward. The Smooks specific parts of the process are easy and are not really the issue. The level of effort involved is a function of the complexity of the Source data format for which you are implementing the reader.

Implementing a Reader for your custom data format immediately opens all Smooks capabilities to that data format e.g. Java Binding, Templating, Persistence, Validation, Splitting & Routing, etc…​ So a relatively small investment can yield a quite significant return. The only requirement, from a Smooks perspective, is that the Reader implements the standard org.xml.sax.XMLReader interface from the Java JDK. However, if you want to be able to configure the Reader implementation, it needs to implement the org.smooks.api.resource.reader.SmooksXMLReader interface (which is just an extension of org.xml.sax.XMLReader). So, you can easily use (or extend) an existing org.xml.sax.XMLReader implementation, or implement a new Reader from scratch.

Let’s now look at a simple example of implementing a Reader for use with Smooks. In this example, we will implement a Reader that can read a stream of Comma Separated Value (CSV) records, converting the CSV stream into a stream of SAX events that can be processed by Smooks, allowing you to do all the things Smooks allows (Java Binding, etc…​).

We start by implementing the basic Reader class:

public class MyCSVReader implements SmooksXMLReader {

    // Implement all of the XMLReader methods...

Two methods from the XMLReader interface are of particular interest:

setContentHandler(ContentHandler): This method is called by Smooks Core. It sets the ContentHandler instance for the reader. The ContentHandler instance methods are called from inside the parse(InputSource) method.

parse(InputSource): This is the method that receives the Source data input stream, parses it (i.e. in the case of this example, the CSV stream) and generates the SAX event stream through calls to the ContentHandler instance supplied in the setContentHandler(ContentHandler) method.

We need to configure our CSV reader with the names of the fields associated with the CSV records. Configuring a custom reader implementation is the same as for any Smooks component, as described in the Configuring Smooks Components section above.

So focusing a little more closely on the above methods and our fields configuration:

public class MyCSVReader implements SmooksXMLReader {

    private ContentHandler contentHandler;

    private String[] fields; // Auto decoded and injected from the "fields" <param> on the reader config.

    public void setContentHandler(ContentHandler contentHandler) {
        this.contentHandler = contentHandler;

    public void parse(InputSource csvInputSource) throws IOException, SAXException {
        // TODO: Implement parsing of CSV Stream...

    // Other XMLReader methods...

So now we have our basic Reader implementation stub. We can start writing unit tests to test the new reader implementation.

First thing we need is some sample CSV input. Lets use a simple list of names:


Tom,Fennelly Mike,Fennelly Mark,Jones

Second thing we need is a test Smooks configuration to configure Smooks with our MyCSVReader. As stated before, everything in Smooks is a resource and can be configured with the basic configuration. While this works fine, it’s a little noisy, so Smooks provides a basic configuration element specifically for the purpose of configuring a reader. The configuration for our test looks like the following:

<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <reader class="com.acme.MyCSVReader">
            <param name="fields">firstname,lastname</param>


And of course we need the JUnit test class:

public class MyCSVReaderTest extends TestCase {

    public void test() {
        Smooks smooks = new Smooks(getClass().getResourceAsStream("mycsvread-config.xml"));
        StringResult serializedCSVEvents = new StringResult();

        smooks.filterSource(new StreamSource(getClass().getResourceAsStream("names.csv")), serializedCSVEvents);


        // TODO: add assertions, etc...

So now we have a basic setup with our custom Reader implementation, as well as a unit test that we can use to drive our development. Of course, our reader parse method is not doing anything yet and our test class is not making any assertions, etc…​ So lets start implementing the parse method:

public class MyCSVReader implements SmooksXMLReader {

    private ContentHandler contentHandler;

    private String[] fields; // Auto decoded and injected from the "fields" <param> on the reader config.

    public void setContentHandler(ContentHandler contentHandler) {
        this.contentHandler = contentHandler;

    public void parse(InputSource csvInputSource) throws IOException, SAXException {
        BufferedReader csvRecordReader = new BufferedReader(csvInputSource.getCharacterStream());
        String csvRecord;

        // Send the start of message events to the handler...
        contentHandler.startElement(XMLConstants.NULL_NS_URI, "message-root", "", new AttributesImpl());

        csvRecord = csvRecordReader.readLine();
        while(csvRecord != null) {
            String[] fieldValues = csvRecord.split(",");

            // perform checks...

            // Send the events for this record...
            contentHandler.startElement(XMLConstants.NULL_NS_URI, "record", "", new AttributesImpl());
            for(int i = 0; i < fields.length; i++) {
                contentHandler.startElement(XMLConstants.NULL_NS_URI, fields[i], "", new AttributesImpl());
                contentHandler.characters(fieldValues[i].toCharArray(), 0, fieldValues[i].length());
                contentHandler.endElement(XMLConstants.NULL_NS_URI, fields[i], "");
            contentHandler.endElement(XMLConstants.NULL_NS_URI, "record", "");

            csvRecord = csvRecordReader.readLine();

        // Send the end of message events to the handler...
        contentHandler.endElement(XMLConstants.NULL_NS_URI, "message-root", "");

    // Other XMLReader methods...

If you run the unit test class now, you should see the following output on the console (formatted):


After this, it is just a case of expanding the tests, hardening the reader implementation code, etc…​

Now you can use your reader to perform all sorts of operations supported by Smooks. As an example, the following configuration could be used to bind the names into a List of PersonName objects:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd" xmlns:jb="https://www.smooks.org/xsd/smooks/javabean-1.6.xsd">

    <reader class="com.acme.MyCSVReader">
            <param name="fields">firstname,lastname</param>

    <jb:bean beanId="peopleNames" class="java.util.ArrayList" createOnElement="message-root">
        <jb:wiring beanIdRef="personName" />

    <jb:bean beanId="personName" class="com.acme.PersonName" createOnElement="message-root/record">
        <jb:value property="first" data="record/firstname" />
        <jb:value property="last" data="record/lastname" />


And then a test for this configuration could look as follows:

public class MyCSVReaderTest extends TestCase {

    public void test_java_binding() {
        Smooks smooks = new Smooks(getClass().getResourceAsStream("java-binding-config.xml"));
        JavaResult javaResult = new JavaResult();

        smooks.filterSource(new StreamSource(getClass().getResourceAsStream("names.csv")), javaResult);

        List<PersonName> peopleNames = (List<PersonName>) javaResult.getBean("peopleNames");

        // TODO: add assertions etc

For more on Java Binding, see the Java Binding section.


Reader instances are never used concurrently. Smooks Core will create a new instance for every message, or, will pool and reuse instances as per the readerPoolSize FilterSettings property.

If your Reader requires access to the Smooks ExecutionContext for the current filtering context, your Reader needs to implement the SmooksXMLReader interface.

If your Source data is a binary data stream your Reader must implement the StreamReader interface. See next section.

You can programmatically configure your reader (e.g. in your unit tests) using a GenericReaderConfigurator instance, which you then set on the Smooks instance.

While the basic configuration is fine, it’s possible to define a custom configuration namespace (XSD) for your custom CSV Reader implementation. This topic is not covered here. Review the source code to see the extended configuration namespace for the Reader implementations supplied with Smooks (out-of-the-box) e.g. the EDIReader, CSVReader, JSONReader, etc…​ From this, you should be able to work out how to do this for your own custom Reader.

Implementing a Binary Source Reader

Prior to Smooks v1.5, binary readers needed to implement the StreamReader interface. This is no longer a requirement. All XMLReader instances receive an InputSource (to their parse method) that contains an InputStream if the InputStream was provided in the StreamSource passed in the Smooks.filterSource method call. This means that all XMLReader instance are guaranteed to receive an InputStream if one is available, so no need to mark the XMLReader instance.

Implementing a Flat File Source Reader

In Smooks v1.5 we tried to make it a little easier to implement a custom reader for reading flat file data formats. By flat file we mean "record" based data formats, where the data in the message is structured in flat records as opposed to a more hierarchical structure. Examples of this would be Comma Separated Value (CSV) and Fixed Length Field (FLF). The new API introduced in Smooks v1.5 should remove the complexity of the XMLReader API (as outlined above).

The API is composed of 2 interfaces plus a number of support classes.These interfaces work as a pair. They need to be implemented if you wish to use this API for processing a custom Flat File format not already supported by Smooks.

 * {@link RecordParser} factory class.
 * <p/>
 * Configurable by the Smooks {@link org.smooks.cdr.annotation.Configurator}
public interface RecordParserFactory {

     * Create a new Flat File {@link RecordParser} instance.
     * @return A new {@link RecordParser} instance.
    RecordParser newRecordParser();

 * Flat file Record Parser.
public interface RecordParser<T extends RecordParserFactory>  {

     * Set the parser factory that created the parser instance.
     * @param factory The parser factory that created the parser instance.
    void setRecordParserFactory(T factory);

     * Set the Flat File data source on the parser.
     * @param source The flat file data source.
    void setDataSource(InputSource source);

     * Parse the next record from the message stream and produce a {@link Record} instance.
     * @return The records instance.
     * @throws IOException Error reading message stream.
    Record nextRecord() throws IOException;


Obviously the RecordParserFactory implementation is responsible for creating the RecordParser instances for the Smooks runtime. The RecordParserFactory is the class that Smooks configures, so it is in here you place all your @Inject details. The created RecordParser instances are supplied with a reference to the RecordParserFactory instance that created them, so it is easy enough the provide them with access to the configuration via getters on the RecordParserFactory implementation.

The RecordParser implementation is responsible for parsing out each record (a Record contains a set of Fields) in the nextRecord() method. Each instance is supplied with the Reader to the message stream via the setReader(Reader) method. The RecordParser should store a reference to this Reader and use it in the nextRecord() method. A new instance of a given RecordParser implementation is created for each message being filtered by Smooks.

Configuring your implementation in the Smooks configuration is as simple as the following:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <ff:reader fields="first,second,third" parserFactory="com.acme.ARecordParserFactory">
            <param name="aConfigParameter">aValue</param>
            <param name="bConfigParameter">bValue</param>

 Other Smooks configurations e.g. <jb:bean> configurations


The Flat File configuration also supports basic Java binding configurations, inlined in the reader configuration.

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <ff:reader fields="firstname,lastname,gender,age,country" parserFactory="com.acme.PersonRecordParserFactory">
        <!-- The field names must match the property names on the Person class. -->
        <ff:listBinding beanId="people" class="com.acme.Person" />


To execute this configuration:

Smooks smooks = new Smooks(configStream);
JavaResult result = new JavaResult();

smooks.filterSource(new StreamSource(messageReader), result);

List<Person> people = (List<Person>) result.getBean("people");

Smooks also supports creation of Maps from the record set:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <ff:reader fields="firstname,lastname,gender,age,country" parserFactory="com.acme.PersonRecordParserFactory">
        <ff:mapBinding beanId="people" class="com.acme.Person" keyField="firstname" />


The above configuration would produce a Map of Person instances, keyed by the "firstname" value of each Person. It would be executed as follows:

Smooks smooks = new Smooks(configStream);
JavaResult result = new JavaResult();

smooks.filterSource(new StreamSource(messageReader), result);

Map<String, Person> people = (Map<String, Person>) result.getBean("people");

Person tom = people.get("Tom");
Person mike = people.get("Mike");

Virtual Models are also supported, so you can define the class attribute as a java.util.Map and have the record field values bound into Map instances, which are in turn added to a List or a Map.

VariableFieldRecordParser and VariableFieldRecordParserFactory

VariableFieldRecordParser and VariableFieldRecordParserFactory are abstract implementations of the RecordParser and RecordParserFactory interface. They provide very useful base implementations for a Flat File Reader, providing base support for:

The utility java binding configurations as outlined in the previous section.

Support for "variable field" records i.e. a flat file message that contains multiple record definitions. The different records are identified by the value of the first field in the record and are defined as follows: fields="book[name,author] | magazine[*]". Note the record definitions are pipe separated. "book" records will have a first field value of "book" while "magazine" records will have a first field value of "magazine". Astrix ("*") as the field definition for a record basically tells the reader to generate the field names in the generated events (e.g. "field_0", "field_1", etc…​).

The ability to read the next record chunk, with support for a simple record delimiter, or a regular expression (regex) pattern that marks the beginning of each record.

The CSV and Regex readers are implemented using these abstract classes. See the csv-variable-record and flatfile-to-xml-regex examples. The Regex Reader implementation is also a good example that can be used as a basis for your own custom flat file reader.

Implementing a Fragment Visitor

Visitors are the workhorse of Smooks. Most of the out-of-the-box functionality in Smooks (Java binding, templating, persistence, etc…​) was created by creating one or more visitors. Visitors often collaborate through the ExecutionContext and ApplicationContext objects, accomplishing a common goal by working together.

ImportantSmooks treats all visitors as stateless objects. A visitor instance must be usable concurrently across multiple messages, that is, across multiple concurrent calls to the Smooks.filterSource method.All state associated with the current Smooks.filterSource execution must be stored in the ExecutionContext. For more details see the ExecutionContext and ApplicationContex section.

SAX NG Visitor API

The SAX NG visitor API is made up of a number of interfaces. These interfaces are based on the SAX events that a SaxNgVisitor implementation can capture and processes. Depending on the use case being solved with the SaxNgVisitor implementation, you may need to implement one or all of these interfaces.

BeforeVisitor: Captures the startElement SAX event for the targeted fragment element:

public interface BeforeVisitor extends Visitor {

    void visitBefore(Element element, ExecutionContext executionContext);

ChildrenVisitor: Captures the character based SAX events for the targeted fragment element, as well as Smooks generated (pseudo) events corresponding to the startElement events of child fragment elements:

public interface ChildrenVisitor extends Visitor {

    void visitChildText(CharacterData characterData, ExecutionContext executionContext) throws SmooksException, IOException;

    void visitChildElement(Element childElement, ExecutionContext executionContext) throws SmooksException, IOException;

AfterVisitor: Captures the endElement SAX event for the targeted fragment element:

public interface AfterVisitor extends Visitor {

    void visitAfter(Element element, ExecutionContext executionContext);

As a convenience for those implementations that need to capture all the SAX events, the above three interfaces are pulled together into a single interface in the ElementVisitor interface.

Illustrating these events using a piece of XML:

    <target-fragment>      <--- BeforeVisitor.visitBefore
        Text!!                       <--- ChildrenVisitor.visitChildText
        <child>                      <--- ChildrenVisitor.visitChildElement
    </target-fragment>     <--- AfterVisitor.visitAfter
NoteOf course, the above is just an illustration of a Source message event stream and it looks like XML, but could be EDI, CSV, JSON, etc…​ Think of this as just an XML serialization of a Source message event stream, serialized as XML for easy reading.

Element: As can be seen from the above SAX NG interfaces, Element type is passed in all method calls. This object contains details about the targeted fragment element, including attributes and their values. We’ll discuss text accumulation and StreamResult writing in the coming sections.

Text Accumulation

SAX is a stream based processing model. It doesn’t create a Document Object Model (DOM) of any form. It doesn’t "accumulate" event data in any way. This is why it is a suitable processing model for processing huge message streams.

The Element will always contain attributes associated with the targeted element, but will not contain the fragment child text data, whose SAX events (ChildrenVisitor.visitChildText) occur between the BeforeVisitor.visitBefore and AfterVisitor.visitAfter events (see above illustration). The filter does not accumulate text events on the Element because, as already stated, that could result in a significant performance drain. Of course the downside to this is the fact that if your SaxNgVisitor implementation needs access to the text content of a fragment, you need to explicitly tell Smooks to accumulate text for the targeted fragment. This is done by stashing the text into a memento from within the ChildrenVisitor.visitChildText method and then restoring the memento from within the AfterVisitor.visitAfter method implementation of your SaxNgVisitor as shown below:

public class MyVisitor implements ChildrenVisitor, AfterVisitor {

    public void visitChildText(CharacterData characterData, ExecutionContext executionContext) {
        executionContext.getMementoCaretaker().stash(new TextAccumulatorMemento(new NodeVisitable(characterData.getParentNode()), this), textAccumulatorMemento -> textAccumulatorMemento.accumulateText(characterData.getTextContent()));

    public void visitChildElement(Element childElement, ExecutionContext executionContext) {


    public void visitAfter(Element element, ExecutionContext executionContext) {
        TextAccumulatorMemento textAccumulatorMemento = new TextAccumulatorMemento(new NodeVisitable(element), this);
        String fragmentText = textAccumulatorMemento.getTextContent();

        // ... etc ...

It is a bit ugly having to implement ChildrenVisitor.visitChildText just to tell Smooks to accumulate the text events for the targeted fragment. For that reason, we have the @TextConsumer annotation that can be used to annotate your SaxNgVisitor implementation, removing the need to implement the ChildrenVisitor.visitChildText method:

public class MyVisitor implements AfterVisitor {

    public void visitAfter(Element element, ExecutionContext executionContext) {
        String fragmentText = element.getTextContent();

        // ... etc ...

Note that the complete fragment text will not be available until the AfterVisitor.visitAfter event.

StreamResult Writing/Serialization

The Smooks.filterSource(Source, Result) method can take one or more of a number of different Result type implementations, one of which is the StreamResult class (see Multiple Outputs/Results). By default, Smooks will always serialize the full Source event stream as XML to any StreamResult instance provided to the Smooks.filterSource(Source, Result) method.

So, if the Source provided to the Smooks.filterSource(Source, Result) method is an XML stream and a StreamResult instance is provided as one of the Result instances, the Source XML will be written out to the StreamResult unmodified, unless the Smooks instance is configured with one or more SaxNgVisitor implementations that modify one or more fragments. In other words, Smooks streams the Source in and back out again through the StreamResult instance. Default serialization can be turned on/off by configuring the filter settings.

If you want to modify the serialized form of one of the message fragments (i.e. "transform"), you need to implement a SaxNgVisitor to do so and target it at the message fragment using an XPath-like expression.

NoteOf course, you can also modify the serialized form of a message fragment using one of the out-of-the-box Templating components. These components are also SaxNgVisitor implementations.

The key to implementing a SaxNgVisitor geared towards transforming the serialized form of a fragment is telling Smooks that the SaxNgVisitor implementation in question will be writing to the StreamResult. You need to tell Smooks this because Smooks supports targeting of multiple SaxNgVisitor implementations at a single fragment, but only one SaxNgVisitor is allowed to write to the StreamResult, per fragment. If a second SaxNgVisitor attempts to write to the StreamResult, a SAXWriterAccessException will result and you will need to modify your Smooks configuration.

In order to be "the one" that writes to the StreamResult, the SaxNgVisitor needs to acquire ownership of the Writer to the StreamResult. It does this by simply making a call to the ExecutionContext.getWriter().write(…​) method from inside the BeforeVisitor.visitBefore methods implementation:

public class MyVisitor implements ElementVisitor {

    public void visitBefore(Element element, ExecutionContext executionContext) {
        Writer writer = executionContext.getWriter();

        // ... write the start of the fragment...

    public void visitChildText(CharacterData characterData, ExecutionContext executionContext) {
        Writer writer = executionContext.getWriter();

        // ... write the child text...

    public void visitChildElement(Element childElement, ExecutionContext executionContext) {

    public void visitAfter(Element element, ExecutionContext executionContext) {
        Writer writer = executionContext.getWriter();

        // ... close the fragment...
NoteIf you need to control serialization of sub-fragments you need to reset the Writer instance so as to divert serialization of the sub-fragments. You do this by calling ExecutionContext.setWriter.

Sometimes you know that the target fragment you are serializing/transforming will never have sub-fragments. In this situation, it’s a bit ugly to have to implement the BeforeVisitor.visitBefore method just to make a call to the ExecutionContext.getWriter().write(...) method to acquire ownership of the Writer. For this reason, we have the @StreamResultWriter annotation. Used in combination with the @TextConsumer annotation, we can remove the need to implement all but the AfterVisitor.visitAfter method:

public class MyVisitor implements AfterVisitor {

    public void visitAfter(Element element, ExecutionContext executionContext) {
        Writer writer = executionContext.getWriter();

        // ... serialize to the writer ...


Smooks provides the DomSerializer class to make serializing of element data, as XML, a little easier. This class allows you to write a SaxNgVisitor implementation like:

public class MyVisitor implements ElementVisitor {

    private DomSerializer domSerializer = new DomSerializer(true, true);

    public void visitBefore(Element element, ExecutionContext executionContext) {
        try {
            domSerializer.writeStartElement(element, executionContext.getWriter());
        } catch (IOException e) {
            throw new SmooksException(e);

    public void visitChildText(CharacterData characterData, ExecutionContext executionContext) {
        try {
            domSerializer.writeText(characterData, executionContext.getWriter());
        } catch (IOException e) {
            throw new SmooksException(e);

    public void visitChildElement(Element element, ExecutionContext executionContext) throws SmooksException, IOException {

    public void visitAfter(Element element, ExecutionContext executionContext) throws SmooksException, IOException {
        try {
            domSerializer.writeEndElement(element, executionContext.getWriter());
        } catch (IOException e) {
            throw new SmooksException(e);

You may have noticed that the arguments in the DomSerializer constructor are boolean. This is the closeEmptyElements and rewriteEntities args which should be based on the closeEmptyElements and rewriteEntities filter setting, respectively. Smooks provides a small code optimization/assist here. If you annotate the DomSerializer field with @Inject, Smooks will create the DomSerializer instance and initialize it with the closeEmptyElements and rewriteEntities filter settings for the associated Smooks instance:

public class MyVisitor implements AfterVisitor {

    private DomSerializer domSerializer;

    public void visitAfter(Element element, ExecutionContext executionContext) throws SmooksException, IOException {
        try {
            domSerializer.writeStartElement(element, executionContext.getWriter());
            domSerializer.writeText(element, executionContext.getWriter());
            domSerializer.writeEndElement(element, executionContext.getWriter());
        } catch (IOException e) {
            throw new SmooksException(e);

Visitor Configuration

SaxNgVisitor configuration works in exactly the same way as any other Smooks component. See Configuring Smooks Components.

The most important thing to note with respect to configuring visitor instances is the fact that the selector attribute is interpreted as an XPath (like) expression. For more on this see the docs on Selectors.

Also note that visitors can be programmatically configured on a Smooks instance. Among other things, this is very useful for unit testing.

Example Visitor Configuration

Let’s assume we have a very simple SaxNgVisitor implementation as follows:

public class ChangeItemState implements AfterVisitor {

    private DomSerializer domSerializer;

    private String newState;

    public void visitAfter(Element element, ExecutionContext executionContext) {
        element.setAttribute("state", newState);

        try {
            domSerializer.writeStartElement(element, executionContext.getWriter());
            domSerializer.writeText(element, executionContext.getWriter());
            domSerializer.writeEndElement(element, executionContext.getWriter());
        } catch (IOException e) {
            throw new SmooksException(e);

Declaratively configuring ChangeItemState to fire on fragments having a status of "OK" is as simple as:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd">

    <resource-config selector="order-items/order-item[@status = 'OK']">
        <resource>com.acme.ChangeItemState </resource>
        <param name="newState">COMPLETED</param>


Of course it would be really nice to be able to define a cleaner and more strongly typed configuration for the ChangeItemState component, such that it could be configured something like:

<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"

    <order:changeItemState itemElement="order-items/order-item[@status = 'OK']" newState="COMPLETED" />


For details on this, see the section on Defining Custom Configuration Namespaces.

This visitor could also be programmatically configured on a Smooks as follows:

Smooks smooks = new Smooks();

smooks.addVisitor(new ChangeItemState().setNewState("COMPLETED"), "order-items/order-item[@status = 'OK']");

smooks.filterSource(new StreamSource(inReader), new StreamResult(outWriter));

Visitor Instance Lifecycle

One aspect of the visitor lifecycle has already been discussed in the general context of Smooks component initialization and uninitialization.

Smooks supports two additional component lifecycle events, specific to visitor components, via the ExecutionLifecycleCleanable and VisitLifecycleCleanable interfaces.


Visitor components implementing this lifecycle interface will be able to perform post Smooks.filterSource lifecycle operations.

public interface ExecutionLifecycleCleanable extends Visitor {

    void executeExecutionLifecycleCleanup(ExecutionContext executionContext);

The basic call sequence can be described as follows (note the executeExecutionLifecycleCleanup calls):

smooks = new Smooks(..);

            ** VisitorXX.executeExecutionLifecycleCleanup **
            ** VisitorXX.executeExecutionLifecycleCleanup **
            ** VisitorXX.executeExecutionLifecycleCleanup **
        ... etc ...

This lifecycle method allows you to ensure that resources scoped around the Smooks.filterSource execution lifecycle can be cleaned up for the associated ExecutionContext.


Visitor components implementing this lifecycle interface will be able to perform post AfterVisitor.visitAfter lifecycle operations.

public interface VisitLifecycleCleanable extends Visitor {

    void executeVisitLifecycleCleanup(ExecutionContext executionContext);

The basic call sequence can be described as follows (note the executeVisitLifecycleCleanup calls):


        <target-fragment>      <--- VisitorXX.visitBefore
            Text!!                       <--- VisitorXX.visitChildText
            <child>                      <--- VisitorXX.visitChildElement
        </target-fragment>     <--- VisitorXX.visitAfter
        ** VisitorXX.executeVisitLifecycleCleanup **
        <target-fragment>      <--- VisitorXX.visitBefore
            Text!!                       <--- VisitorXX.visitChildText
            <child>                      <--- VisitorXX.visitChildElement
        </target-fragment>     <--- VisitorXX.visitAfter
        ** VisitorXX.executeVisitLifecycleCleanup **


        <target-fragment>      <--- VisitorXX.visitBefore
            Text!!                       <--- VisitorXX.visitChildText
            <child>                      <--- VisitorXX.visitChildElement
        </target-fragment>     <--- VisitorXX.visitAfter
        ** VisitorXX.executeVisitLifecycleCleanup **
        <target-fragment>      <--- VisitorXX.visitBefore
            Text!!                       <--- VisitorXX.visitChildText
            <child>                      <--- VisitorXX.visitChildElement
        </target-fragment>     <--- VisitorXX.visitAfter
        ** VisitorXX.executeVisitLifecycleCleanup **

This lifecycle method allows you to ensure that resources scoped around a single fragment execution of a SaxNgVisitor implementation can be cleaned up for the associated ExecutionContext.


ExecutionContext is scoped specifically around a single execution of a Smooks.filterSource method. All Smooks visitors must be stateless within the context of a single execution. A visitor is created once in Smooks and referenced across multiple concurrent executions of the Smooks.filterSource method. All data stored in an ExecutionContext instance will be lost on completion of the Smooks.filterSource execution. ExecutionContext is a parameter in all visit invocations.


ApplicationContext is scoped around the associated Smooks instance: only one ApplicationContext instance exists per Smooks instance. This context object can be used to store data that needs to be maintained (and accessible) across multiple Smooks.filterSource executions. Components (any component, including SaxNgVisitor components) can gain access to their associated ApplicationContext instance by declaring an ApplicationContext class property and annotating it with @Inject:

public class MySmooksResource {

    private ApplicationContext appContext;

    // etc...


You can join these groups and chats to discuss and ask Smooks related questions:

Mailing list: googlegroups: smooks-user

Mailing list: googlegroups: smooks-user

Chat room about using Smooks: gitter:smooks/smooks

Issue tracker: github:smooks/smooks


Please see the following guidelines if you’d like to contribute code to Smooks.

Download Details:
Author: smooks
Source Code: https://github.com/smooks/smooks
License: View license


Dexter  Goodwin

Dexter Goodwin


EpicEditor: an Embeddable JavaScript Markdown Editor

An Embeddable JavaScript Markdown Editor

EpicEditor is an embeddable JavaScript Markdown editor with split fullscreen editing, live previewing, automatic draft saving, offline support, and more. For developers, it offers a robust API, can be easily themed, and allows you to swap out the bundled Markdown parser with anything you throw at it.


Because, WYSIWYGs suck. Markdown is quickly becoming the replacement. GitHub, Stackoverflow, and even blogging apps like Posterous are now supporting Markdown. EpicEditor allows you to create a Markdown editor with a single line of JavaScript:

var editor = new EpicEditor().load();

Quick Start

EpicEditor is easy to implement. Add the script and assets to your page, provide a target container and call load().

Step 1: Download

Download the latest release or clone the repo:

$ git clone git@github.com:OscarGodson/EpicEditor

Step 2: Create your container element

<div id="epiceditor"></div>

Step 3: Add the epiceditor.js file

<script src="epiceditor.min.js"></script>

Step 4: Init EpicEditor

var editor = new EpicEditor().load();



The EpicEditor constructor creates a new editor instance. Customize the instance by passing the options parameter. The example below uses all options and their defaults:

var opts = {
  container: 'epiceditor',
  textarea: null,
  basePath: 'epiceditor',
  clientSideStorage: true,
  localStorageName: 'epiceditor',
  useNativeFullscreen: true,
  parser: marked,
  file: {
    name: 'epiceditor',
    defaultContent: '',
    autoSave: 100
  theme: {
    base: '/themes/base/epiceditor.css',
    preview: '/themes/preview/preview-dark.css',
    editor: '/themes/editor/epic-dark.css'
  button: {
    preview: true,
    fullscreen: true,
    bar: "auto"
  focusOnLoad: false,
  shortcut: {
    modifier: 18,
    fullscreen: 70,
    preview: 80
  string: {
    togglePreview: 'Toggle Preview Mode',
    toggleEdit: 'Toggle Edit Mode',
    toggleFullscreen: 'Enter Fullscreen'
  autogrow: false
var editor = new EpicEditor(opts);


containerThe ID (string) or element (object) of the target container in which you want the editor to appear.epiceditor
textareaThe ID (string) or element (object) of a textarea you would like to sync the editor's content with. On page load if there is content in the textarea, the editor will use that as it's content. 
basePathThe base path of the directory containing the /themes.epiceditor
clientSideStorageSetting this to false will disable localStorage.true
localStorageNameThe name to use for the localStorage object.epiceditor
useNativeFullscreenSet to false to always use faux fullscreen (the same as what is used for unsupported browsers).true
parser[Marked](https://github.com/chjj/marked) is the only parser built into EpicEditor, but you can customize or toggle this by passing a parsing function to this option. For example:
parser: MyCustomParser.parse
focusOnLoadIf true, editor will focus on load.false
file.nameIf no file exists with this name a new one will be made, otherwise the existing will be opened.container ID
file.defaultContentThe content to show if no content exists for a file. NOTE: if the textarea option is used, the textarea's value will take precedence over defaultContent. 
file.autoSaveHow often to auto save the file in milliseconds. Set to false to turn it off.100
theme.baseThe base styles such as the utility bar with the buttons.themes/base/epiceditor.css
theme.editorThe theme for the editor which is the area you type into.themes/editor/epic-dark.css
theme.previewThe theme for the previewer.themes/preview/github.css
buttonIf set to false will remove all buttons.All buttons set to true.
button.previewIf set to false will remove the preview button.true
button.fullscreenIf set to false will remove the fullscreen button.true
button.barIf true or "show", any defined buttons will always be visible. If false or "hide", any defined buttons will never be visible. If "auto", buttons will usually be hidden, but shown if whenever the mouse is moved."auto"
shortcut.modifierThe key to hold while holding the other shortcut keys to trigger a key combo.18 (alt key)
shortcut.fullscreenThe shortcut to open fullscreen.70 (f key)
shortcut.previewThe shortcut to toggle the previewer.80 (p key)
string.togglePreviewThe tooltip text that appears when hovering the preview icon.Toggle Preview Mode
string.toggleEditThe tooltip text that appears when hovering the edit icon.Toggle Edit Mode
string.toggleFullscreenThe tooltip text that appears when hovering the fullscreen icon.Enter Fullscreen
autogrowWhether to autogrow EpicEditor to fit its contents. If autogrow is desired one can either specify true, meaning to use default autogrow settings, or an object to define custom settingsfalse
autogrow.minHeightThe minimum height (in pixels) that the editor should ever shrink to. This may also take a function that returns the desired minHeight if this is not a constant, or a falsey value if no minimum is desired80
autogrow.maxHeightThe maximum height (in pixels) that the editor should ever grow to. This may also take a function that returns the desired maxHeight if this is not a constant, or a falsey value if no maximum is desiredfalse
autogrow.scrollWhether the page should scroll to keep the caret in the same vertical place while autogrowing (recommended for mobile in particular)true


Loads the editor by inserting it into the DOM by creating an iframe. Will trigger the load event, or you can provide a callback.

editor.load(function () {
  console.log("Editor loaded.")


Unloads the editor by removing the iframe. Keeps any options and file contents so you can easily call .load() again. Will trigger the unload event, or you can provide a callback.

editor.unload(function () {
  console.log("Editor unloaded.")


Grabs an editor element for easy DOM manipulation. See the Themes section below for more on the layout of EpicEditor elements.

  • container: The element given at setup in the options.
  • wrapper: The wrapping <div> containing the 2 editor and previewer iframes.
  • wrapperIframe: The iframe containing the wrapper element.
  • editor: The #document of the editor iframe (i.e. you could do editor.getElement('editor').body).
  • editorIframe: The iframe containing the editor element.
  • previewer: The #document of the previewer iframe (i.e. you could do editor.getElement('previewer').body).
  • previewerIframe: The iframe containing the previewer element.
someBtn.onclick = function () {
  console.log(editor.getElement('editor').body.innerHTML); // Returns the editor's content


Returns a boolean for the requested state. Useful when you need to know if the editor is loaded yet for example. Below is a list of supported states:

  • loaded
  • unloaded
  • edit
  • preview
  • fullscreen
fullscreenBtn.onclick = function () {
  if (!editor.is('loaded')) { return; }


Opens a client side storage file into the editor.

Note: This does not open files on your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

openFileBtn.onclick = function () {
  editor.open('some-file'); // Opens a file when the user clicks this button


Imports a string of content into a client side storage file. If the file already exists, it will be overwritten. Useful if you want to inject a bunch of content via AJAX. Will also run .open() after import automatically.

Note: This does not import files on your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

importFileBtn.onclick = function () {
  editor.importFile('some-file',"#Imported markdown\nFancy, huh?"); //Imports a file when the user clicks this button


Returns the plain text of the client side storage file, or if given a type, will return the content in the specified type. If you leave both parameters null it will return the current document's content in plain text. The supported export file types are:

Note: This does not export files to your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

  • text (default)
  • html
  • json (includes metadata)
  • raw (warning: this is browser specific!)
syncWithServerBtn.onclick = function () {
  var theContent = editor.exportFile();
  saveToServerAjaxCall('/save', {data:theContent}, function () {
    console.log('Data was saved to the database.');

rename(oldName, newName)

Renames a client side storage file.

Note: This does not rename files on your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

renameFileBtn.onclick = function () {
  var newName = prompt('What do you want to rename this file to?');
  editor.rename('old-filename.md', newName); //Prompts a user and renames a file on button click


Manually saves a file to client side storage (localStorage by default). EpicEditor will save continuously every 100ms by default, but if you set autoSave in the options to false or to longer intervals it's useful to manually save.

Note: This does not save files to your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

saveFileBtn.onclick = function () {


Deletes a client side storage file.

Note: This does not remove files from your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

removeFileBtn.onclick = function () {

getFiles([name], [excludeContent])

If no name is given it returns an object containing the names and metadata of all client side storage file objects. If a name is specified it will return just the metadata of that single file object. If excludeContent is true, it will remove the content from the returned object. This is useful when you just want a list of files or get some meta data. If excludeContent is false (default), it'll return a content property per file in plain text format.

Note: This does not get files from your server or machine (yet). This simply looks in localStorage where EpicEditor stores drafts.

var files = editor.getFiles();
for (x in files) {
  console.log('File: ' + x); //Returns the name of each file

on(event, handler)

Sets up an event handler (callback) for a specified event. For all event types, see the Events section below.

editor.on('unload', function () {
  console.log('Editor was removed');


Fires an event programatically. Similar to jQuery's .trigger()

editor.emit('unload'); // Triggers the handler provided in the "on" method above

removeListener(event, [handler])

Allows you to remove all listeners for an event, or just the specified one.

editor.removeListener('unload'); //The handler above would no longer fire


Puts the editor into preview mode.

previewBtn.onclick = function () {


Puts the editor into edit mode.

editBtn.onclick = function () {


Puts focus on the editor or previewer (whichever is visible). Works just like doing plain old JavaScript and input focus like someInput.focus(). The benefit of using this method however, is that it handles cross browser issues and also will focus on the visible view (edit or preview).

showEditorBtn.onclick = function () {
  editorWrapper.style.display = 'block'; // switch from being hidden from the user
  editor.focus(); // Focus and allow user to start editing right away


Puts the editor into fullscreen mode.

Note: due to browser security restrictions, calling enterFullscreen programmatically like this will not trigger native fullscreen. Native fullscreen can only be triggered by a user interaction like mousedown or keyup.

enterFullscreenBtn.onclick = function () {


Closes fullscreen mode.

exitFullscreenBtn.onclick = function () {

reflow([type], [callback])

reflow() allows you to "reflow" the editor in it's container. For example, let's say you increased the height of your wrapping element and want the editor to resize too. You could call reflow and the editor will resize to fit. You can pass it one of two strings as the first parameter to constrain the reflow to either width or height.

It also provides you with a callback parameter if you'd like to do something after the resize is finished. The callback will return the new width and/or height in an object. Additionally, you can also listen for the reflow event. This will also give you back the new size.

Note: If you call reflow() or reflow('width') and you have a fluid width container EpicEditor will no longer be fluid because doing a reflow on the width sets an inline style on the editor.

// For an editor that takes up the whole browser window:
window.onresize = function () {

// Constrain the reflow to just height:
someDiv.resizeHeightHandle = function () {

// Same as the first example, but this has a callback
window.onresize = function () {
  editor.reflow(function (data) {
    console.log('width: ', data.width, ' ', 'height: ', data.height);


You can hook into specific events in EpicEditor with on() such as when a file is created, removed, or updated. Below is a complete list of currently supported events and their description.

Event NameDescription
createFires whenever a new file is created.
readFires whenever a file is read.
updateFires whenever a file is updated.
removeFires whenever a file is deleted.
loadFires when the editor loads via load().
unloadFires whenever the editor is unloaded via unload()
previewFires whenever the previewer is opened (excluding fullscreen) via preview() or the preview button.
editFires whenever the editor is opened (excluding fullscreen) via edit() or the edit button.
fullscreenenterFires whenever the editor opens in fullscreen via fullscreen() or the fullscreen button.
fullscreenexitFires whenever the editor closes in fullscreen via fullscreen() or the fullscreen button.
saveFires whenever save() is called manually, or implicitly by ```importFile``` or ```open```.
autosaveFires whenever the autoSave interval fires, and the file contents have been updated since the last save.
openFires whenever a file is opened or loads automatically by EpicEditor or when open() is called.
reflowFires whenever reflow() is called. Will return the new dimensions in the callback. Will also fire every time there is a resize from autogrow.


Theming is easy in EpicEditor. There are three different <iframe>s which means styles wont leak between the "chrome" of EpicEditor, previewer, or editor. Each one is like it's own web page. In the themes directory you'll see base, preview, and editor. The base styles are for the "chrome" of the editor which contains elements such as the utility bar containing the icons. The editor is the styles for the contents of editor <iframe> and the preview styles are applied to the preview <iframe>.

The HTML of a generated editor (excluding contents) looks like this:

<div id="container">
  <iframe id="epiceditor-instance-id">
        <link type="text/css" id="" rel="stylesheet" href="epiceditor/themes/base/epiceditor.css" media="screen">
        <div id="epiceditor-wrapper">
          <iframe id="epiceditor-editor-frame">
                <link type="text/css" rel="stylesheet" href="epiceditor/themes/editor/epic-dark.css" media="screen">
              <body contenteditable="true">
                <!-- raw content -->
          <iframe id="epiceditor-previewer-frame">
                <link type="text/css" rel="stylesheet" href="epiceditor/themes/preview/github.css" media="screen">
                <div id="epiceditor-preview">
                  <!-- rendered html -->
          <div id="epiceditor-utilbar">
            <span title="Toggle Preview Mode" class="epiceditor-toggle-btn epiceditor-toggle-preview-btn"></span>
            <span title="Enter Fullscreen" class="epiceditor-fullscreen-btn"></span>

Custom Parsers

EpicEditor is set up to allow you to use any parser that accepts and returns a string. This means you can use any flavor of Markdown, process Textile, or even create a simple HTML editor/previewer (parser: false). The possibilities are endless. Just make the parser available and pass its parsing function to the EpicEditor setting and you should be all set.

For even more customization/optimization you can replace the default built-in processor on build. Running jake build parser=path/to/parser.js will override the default Marked build and replace it with your custom script.

See the custom parsers wiki page for more.


If you're having any problems with EpicEditor feel free to open a new ticket. Go ahead and ask us anything and we'll try to help however we can. If you need a little more help with implementing EpicEditor on your site we've teamed up with CodersClan to offer support:


Contributions are greatly encouraged and appreciated. For more on ways to contribute please check the wiki: Contributing Guide.


EpicEditor relies on Marked to parse markdown and is brought to you in part by Oscar Godson and John Donahue. Special thanks to Adam Bickford for the bug fixes and being the QA for pull requests. Lastly, huge thanks to Sebastian Nitu for the amazing logo and doc styles.

Author: OscarGodson
Source Code: https://github.com/OscarGodson/EpicEditor 
License: MIT License

#javascript #css