![]() ![]() # Perform a regex on all sentences to pull out 'useful' sentences which are up to 140 characters in length With open('sherlock.txt', 'r') as trainingfile: # Use NLTK to put it into a list of sentences # Open source text data - saved as 'sherlock.txt' # Script also uses the Natural Language Toolkit library (NLTK) available from # Source data location: "The Adventures of Sherlock Holmes by Arthur Conan Doyle" # Before running this script, please download the adventures of Sherlock Holmes in Plain Text UTF-8 format # This script will take the Sherlock Holmes open source txt from the Open Source Project Gutenberg site () To install it on a Linux operating system: You might need to install this library before you run the script. This script uses the Natural Language Toolkit library (NLTK) to tokenize sentences from the Sherlock Holmes e-book so that I can store each individual sentence as a separate enumeration value. Lex, Slot Type Limits) I use the Python script below to create a zipped JSON file of the 100 slot entries. I want to clean up the data and restrict the sample size to 100 randomly selected sentences of length shorter than 140 characters. (c.f. The rest of this post assumes you have used the Sherlock Holmes e-book.įrom sherlock.txt, I extract sentences of a reasonable length, but fewer than 140 characters, to populate the slots in Lex. There is a secondary requirement that each slot value can be up to 140 UTF-8 characters long. The core requirement for preprocessing is to have many different examples of varying input text (e.g. You can download this text file in plaintext UTF-8 format and save it as sherlock.txt or you can use any training data for the custom slot if you want. The data populates a custom slot in Lex that can accept free text input. Downloading and preprocessing your datasetĪs a training dataset, I use the publicly available, open source text data from The Adventures of Sherlock Holmes by Arthur Conan Doyle. Familiarity with Python for the AWS Lambda function.An AWS account and access to a Region that supports Amazon Lex and Translate.To follow the procedures in this blog post, you need the following: The solution in the following illustration makes full use of Serverless Computing technologies to enable seamless scaling to thousands of users without the need for further engineering effort. Users are free to switch languages at any time during the interaction with Amazon Lex. This intent prompts the user for a source and target language, for example English to Spanish, followed by a text string to translate. In this post, I create an intent in Amazon Lex for a translation action. I’ll take advantage of an existing AWS CloudFormation script and Amazon CloudFront to easily create a web-based implementation of the translator bot. The solution is scalable, using Serverless Computing technologies and designed to allow secure, anonymous access to translation UI via Amazon Cognito. Powered by the same deep learning technologies as Amazon Alexa, Amazon Lex is a service for building conversational interfaces into any application that uses voice and text. Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. In this post, I show how you can use a custom slot with Amazon Lex to take free text as input, submit it to Amazon Translate for translation, and then present the result to the user. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |