Skip to content

Latest commit

 

History

History
56 lines (34 loc) · 1.35 KB

CHANGELOG.md

File metadata and controls

56 lines (34 loc) · 1.35 KB

Changelog

0.5 -- 2021-12-23

  • Add byte scheme

0.4.0 -- 2021-08-03

  • Add support for class scheme -- for multi-class classification field

0.3.2

  • Feature: shrink existing vocabulary to given dataset (useful for parent child transfer)

0.3.1

  • Fix nlcodec CLI bug
  • Improve help messages with epilog
  • Add nlcodec-learn interface for vocabulary learn over PySpark

0.3.0

  • add nlcodec-freqs CLI to setup.py
  • log time and memory usage for learn task
  • log BPE merge operations once every 2s instead of all operations
  • using__slots__: ~25% faster, %30 less memory for BPE with 3M word types
  • nlcodec.db.core with Db and MultipartDb
  • nlcodec.db.batch with Batch and BathIterable
  • CLI nlcodec.learn for learning BPE using pyspark
  • CLI nlcodec.bitextdb to build a database from parallel text

0.2.4 : 2020-07-14

  • fix issue with name as class property (#24, #25)

0.2.3 : 2020-07-07

  • Option to supply preconfigured spark session object
  • Add docs

0.2.2 : 2020-06-14

  • Option to accept term frequencies as input
  • PySpark backend to compute word and char frequencies
  • --min-co-ev of BPE is CLI arg

0.2.1 : 2020-05-30

  • FIX: find_packages() in setup.py file to include nested packages

0.2.0 : 2020-04-17

  • uploaded to pypi : pip install nlcodec
  • public repository with apache license 2.0