Let's see what these arguments mean.# Find predictions and apply non-maxima suppression (boxes, confidence_val) = predictions(scores, geometry) boxes = non_max_suppression(np.array(boxes), probs=confidence_val)Getting final bounding boxes after non max suppressionNow that we have derived the bounding boxes after applying non-max-suppression. We would want to see the bounding boxes on the image and how we can extract the text from the detected bounding boxes. We do this using tesseract.

Data and Preprocessing

CNN Architectures for Text Classification We experimented with the following 2 architectures. EAST can detect text both in images and in the video. As mentioned in the paper, it runs near real-time at 13FPS on 720p images with high text detection accuracy. Another benefit of this technique is that its implementation is available in OpenCV 3.4.2 and OpenCV 4. We will be seeing this EAST model in action along with text recognition.

There are single-shot detection techniques like YOLO(you only look once) and region-based text detection techniques for text detection in the image.

Visualizing Results in TensorBoard

The capability of the Tesseract was mostly limited to structured text data. It would perform quite poorly in unstructured text with significant noise. Further development in tesseract has been sponsored by Google since 2006.Deep-learning based method performs better for the unstructured data. Tesseract 4 added deep-learning based capability with LSTM network(a kind of Recurrent Neural Network) based OCR engine which is focused on the line recognition but also supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. The latest stable version 4.1.0 is released on July 7, 2019. This version is significantly more accurate on the unstructured text as well. Convolutional Neural Network (CNN) based image classifiers became popular after a CNN based method won the ImageNet

Building Custom Deep Learning Based OCR models

And there are many others like this one for chinese characters, this one for CAPTCHA or this one for handwritten words.

  1. Structured Text- Text in a typed document. In a standard background, proper row, standard font and mostly dense.
  Convolutional Neural Networks for text classification built on top of Gensim's well known word2vec cnn.predict(X) # Predict most likely class. cnn.predict_proba(X) # Per class probabilities
  3. #Display the image with bounding box and recognized text orig_image = orig.copy() # Moving over the results and display on the image for ((start_X, start_Y, end_X, end_Y), text) in results: # display the text detected by Tesseract print("{}\n".format(text)) # Displaying text text = "".join([x if ord(x) < 128 else "" for x in text]).strip() cv2.rectangle(orig_image, (start_X, start_Y), (end_X, end_Y), (0, 0, 255), 2) cv2.putText(orig_image, text, (start_X, start_Y - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7,(0,0, 255), 2) plt.imshow(orig_image) plt.title('Output') plt.show()Display image with bounding box and recognized textResultsAbove code uses OpenCV EAST model for text detection and tesseract for text recognition. PSM for the Tesseract has been set accordingly to the image. It is important to note that Tesseract normally requires a clear image for working well.In our current implementation, we did not consider rotating bounding boxes due to its complexity to implement. But in the real scenario where the text is rotated, the above code will not work well. Also, whenever the image is not very clear, tesseract will have difficulty to recognize the text properly.
  4. Convolutional Recurrent Neural Network (CRNN) is a combination of CNN, RNN, and CTC(Connectionist Temporal Classification) loss for image-based sequence recognition tasks, such as scene text recognition and OCR. The network architecture has been taken from this paper published in 2015.

Using the feature vector from max-pooling (with dropout applied) we can generate predictions by doing a matrix multiplication and picking the class with the highest score. We could also apply a softmax function to convert raw scores into normalized probabilities, but that wouldn’t change our final predictions.That’s it, we’re done with our network definition. The full code network definition code is available here. To get the big picture we can also visualize the network in TensorBoard:There are lots of datasets available in English but it's harder to find datasets for other languages. Different datasets present different tasks to be solved. Here are a few examples of datasets commonly used for machine learning OCR problems.

Now we're ready to build our convolutional layers followed by max-pooling. Remember that we use filters of different sizes. Because each convolution produces tensors of different shapes we need to iterate through them, create a layer for each of them, and then merge the results into one big feature vector.Here, we are separately keeping track of summaries for training and evaluation. In our case these are the same quantities, but you may have quantities that you want to track during training only (like parameter update values). tf.merge_summary is a convenience function that merges multiple summary operations into a single operation that we can execute.Here, batch_iter is a helper function I wrote to batch the data, and tf.train.global_step is convenience function that returns the value of global_step. The full code for training is also available here. Because this is an educational post I decided to simplify the model from the original paper a little:

This post is about Optical character recognition(OCR) for text recognition in natural scene images. We will learn about why it is a tough problem, approaches used to solve and the code that goes along with it.

Challenges in the OCR problem arises mostly due to the attribute of the OCR tasks at hand. We can generally divide these tasks into two categories:A Graph contains operations and tensors. You can use multiple graphs in your program, but most programs only need a single graph. You can use the same graph in multiple sessions, but not multiple graphs in one session. TensorFlow always creates a default graph, but you may also create a graph manually and set it as the new default, like we do below. Explicitly creating sessions and graphs ensures that resources are released properly when you no longer need them.Tesseract was originally developed at Hewlett-Packard Laboratories between 1985 and 1994. In 2005, it was open-sourced by HP. As per wikipedia-

Some of the applications are Passport recognition, automatic number plate recognition, converting handwritten texts to digital text, converting typed text to digital text, etc.Text detection techniques required to detect the text in the image and create and bounding box around the portion of the image having text. Standard objection detection techniques will also work here. In TensorFlow, a Session is the environment you are executing graph operations in, and it contains state about Variables and queues. Each session operates on a single graph. If you don't explicitly use a session when creating variables and operations you are using the current default session created by TensorFlow. You can change the default session by executing commands within a session.as_default() block (see below).

  1. Designating important points of the Git history. Creating Tags in GitKraken on commits is easy with the graph
  2. Git does not support switching between preconfigured authors, so if you need to change the author Just create a simple text file, for example, .git-commit-template with the Co-authored-by trailer..
  3. The allow_soft_placement setting allows TensorFlow to fall back on a device with a certain operation implemented when the preferred device doesn’t exist. For example, if our code places an operation on a GPU and we run the code on a machine without GPU, not using allow_soft_placement would result in an error. If log_device_placement is set, TensorFlow log on which devices (CPU or GPU) it places operations. That’s useful for debugging. FLAGS are command-line arguments to our program.

  1. The initialize_all_variables function is a convenience function run all of the initializers we’ve defined for our variables. You can also call the initializer of your variables manually. That’s useful if you want to initialize your embeddings with pre-trained values for example.
  2. $ git config --global user.name Your name here. Email: Git lưu địa chỉ email vào những commit mà chúng ta tạo. Chúng ta sử dụng địa chỉ email để liên kết các commit của bản thân với tài khoản github
  3. 全连接层 都是多分类, 这一步的处理比较类似. 将池化后的矩阵 reshape为二维, 用 tf.nn.sparse_softmax_cross_entropy_with_logits() 计算损失.
  4. TortoiseGit, Git Extensions, and Magit are probably your best bets out of the 13 options considered. Open source is the primary reason people pick TortoiseGit over the competition
  5. chuchus:[reply]loveOpenGL[/reply]不接哦, 我在阿里搞算法, 跟这个已经没啥关系了
  6. Use plain text serialization to avoid unresolvable merge conflicts. Git Large File Storage (LFS) uses Git attributes to track large files with Git, while keeping them out of your actual repository
  7. slightleaves:[reply]chuchus[/reply]好的,谢谢啊,我后来看懂了,是里面的自定义标签,你现在用java还是python写模型啊,用得什么框架,是做推荐系统吗?

Here, tf.nn.softmax_cross_entropy_with_logits is a convenience function that calculates the cross-entropy loss for each class, given our scores and the correct input labels. We then take the mean of the losses. We could also use the sum, but that makes it harder to compare the loss across different batch sizes and train/dev data.

The bounding box can be created around the text through the sliding window technique. However, this is a computationally expensive task. In this technique, a sliding window passes through the image to detect the text in that window, like a convolutional neural network. We try with different window size to not miss the text portion with different size.  There is a convolutional implementation of the sliding window which can reduce the computational time.The first layers embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors using multiple filter sizes. For example, sliding over 3, 4 or 5 words at a time. Next, we max-pool the result of the convolutional layer into a long feature vector, add dropout regularization, and classify the result using a softmax layer.

To allow various hyperparameter configurations we put our code into a TextCNN class, generating the model graph in the init function.

W is our embedding matrix that we learn during training. We initialize it using a random uniform distribution. tf.nn.embedding_lookup creates the actual embedding operation. The result of the embedding operation is a 3-dimensional tensor of shape [None, sequence_length, embedding_size]. Git is a distributed version control system for tracking changes in source code during software development. It is designed for coordinating work among programmers, but it can be used to track..

  1. tf.placeholder creates a placeholder variable that we feed to the network when we execute it at train or test time. The second argument is the shape of the input tensor. None means that the length of that dimension could be anything. In our case, the first dimension is the batch size, and using None allows the network to handle arbitrarily sized batches.
  2. The dataset we’ll use in this post is the Movie Review data from Rotten Tomatoes – one of the data sets also used in the original paper. The dataset contains 10,662 example review sentences, half positive and half negative. The dataset has a vocabulary of size around 20k. Note that since this data set is pretty small we’re likely to overfit with a powerful model. Also, the dataset doesn’t come with an official train/test split, so we simply use 10% of the data as a dev set. The original paper reported results for 10-fold cross-validation on the data.
  3. Dropout is the perhaps most popular method to regularize convolutional neural networks. The idea behind dropout is simple. A dropout layer stochastically “disables” a fraction of its neurons. This prevent neurons from co-adapting and forces them to learn individually useful features. The fraction of neurons we keep enabled is defined by the dropout_keep_prob input to our network. We set this to something like 0.5 during training, and to 1 (disable dropout) during evaluation.

一些结果表明,max-pooling 总是优于 average-pooling ,理想的 filter sizes 是重要的,但具体任务具体考量,而用不用正则化似乎在NLP任务中并没有很大的不同。Once we have detected the bounding boxes having the text, the next step is to recognize text. There are several techniques for recognizing the text. We will be discussing some of the best techniques in the following section.

Many OCR implementations were available even before the boom of deep learning in 2012. While it was popularly believed that OCR was a solved problem, OCR is still a challenging problem especially when text images are taken in an unconstrained environment. TensorFlow's convolutional conv2d operation expects a 4-dimensional tensor with dimensions corresponding to batch, width, height and channel. The result of our embedding doesn't contain the channel dimension, so we add it manually, leaving us with a layer of shape [None, sequence_length, embedding_size, 1].

In this blog, we will be focusing more on unstructured text which is a more complex problem to solve. In this post we will implement a model similar to Kim Yoon's Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become a standard baseline for new text classification architectures.We will use some of the images to show both text detection with the EAST method and text recognition with Tesseract 4. Let's see text detection and recognition in action in the following code. The article here proved to be a helpful resource in writing the code for this project. Gets to 0.89 test accuracy after 2 epochs. 90s/epoch on Intel i5 2.4Ghz CPU. 10s/epoch on Tesla K40 GPU.

The code could deliver excellent results for all the above three images. The text is clear and background behind the text is also uniform in these images. 为不同尺寸的 filter 都建立一个卷积层. 所以会有多个 feature map. 图像是像素点组成的二维数据, 有时还会有RGB三个通道, 所以它们的卷积核至少是二维的. 从某种程度上讲, word is to text as pixel is to image, 所以这个卷积核的 size 与 stride 会有些不一样.

The Street View House Numbers dataset contains 73257 digits for training, 26032 digits for testing, and 531131 additional as extra training data. The dataset includes 10 labels which are the digits 0-9. The dataset differs from MNIST since SVHN has images of house numbers with the house numbers against varying backgrounds. The dataset has bounding boxes around each digit instead of having several images of digits like in MNIST.The Nanonets OCR API allows you to build OCR models with ease. You can upload your data, annotate it, set the model to train and wait for getting predictions through a browser based UI.

The active community of Git Extensions is still growing and is supporting Git Extensions for over..

Take some time and try to understand the output shapes for each of these operations. You can also refer back to Understanding Convolutional Neural Networks for NLP to get some intuition. Visualizing the operations in TensorBoard may help as well (for specific filter sizes 3, 4 and 5 here):I'm assuming that you are already familiar with the basics of Convolutional Neural Networks applied to NLP. If not, I recommend to first read over Understanding Convolutional Neural Networks for NLP to get the necessary background.Here, train_op here is a newly created operation that we can run to perform a gradient update on our parameters. Each execution of train_op is a training step. TensorFlow automatically figures out which variables are "trainable" and calculates their gradients. By defining a global_step variable and passing it to the optimizer we allow TensorFlow handle the counting of training steps for us. The global step will be automatically incremented by one every time you execute train_op.

The model performed pretty decently here. But some of the texts in bounding boxes are not recognized correctly. Numeric 1 could not be detected at all. There is a non-uniform background here, maybe generating a uniform background would have helped this case. Also, 24 is not properly bounded in the box. In such a case, padding the bounding box could help.

1.简介 TextCNN 是利用卷积神经网络对文本进行分类的算法,由 Yoon Kim 在 "Convolutional Neural Networks for Sentence Classification" 一文 (见参考[1]) 中提出. 是2014年的算法. When we instantiate our TextCNN models all the variables and operations defined will be placed into the default graph and session we've created above.

  1. These are the standard ways to preprocess image in a computer vision task. We will not be focusing on preprocessing step in this blog.
  2. CNN Digital is the world leader in online news and information and seeks to inform, engage and empower the world. Staffed 24 hours..
  3. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human..
  4. I am talking about complex backgrounds, noise, lightning, different font, and geometrical distortions in the image.
  5. git push origin iknow_dev_6_8_x86_BRANCH:svn/iknow_dev_6_8_x86_BRANCH
  6. Finally, we’re ready to write our training loop. We iterate over batches of our data, call the train_step function for each batch, and occasionally evaluate and checkpoint our model:
  7. chuchus:[reply]slightleaves[/reply]现在都用tf直接写模型了, 这个我已经不用了

  1. chuchus:[reply]jer8888[/reply]在命令行中 敲tensorboard --logdir=D:\tf_models\iris, 根据提示打开URL即可. 比如我的为http://yichu-amd:6006/.
  2. The .gitignore file is a text file that tells Git which files or folders to ignore in a project. To create a local .gitignore file, create a text file and name it .gitignore (remember to include the . at the beginning)
  3. Git (/ɡɪt/) is a distributed version-control system for tracking changes in source code during software development. It is designed for coordinating work among programmers, but it can be used to track changes in any set of files
  4. Structured Text- Text in a typed document. In a standard background, proper row, standard font and mostly Convolutional Recurrent Neural Network (CRNN) is a combination of CNN, RNN, and CTC..

In this era of digitization, storing, editing, indexing and finding information in a digital document is much easier than spending hours scrolling through the printed/handwritten/typed documents. Using our scores we can define the loss function. The loss is a measurement of the error our network makes, and our goal is to minimize it. The standard loss function for categorization problems it the cross-entropy loss.python ./code/upload-training.py Step 7: Train ModelOnce the Images have been uploaded, begin training the Model

Another TensorFlow feature you typically want to use is checkpointing – saving the parameters of your model to restore them later on. Checkpoints can be used to continue training at a later point, or to pick the best parameters setting using early stopping. Checkpoints are created using a Saver object.

One utilizes the fully convolutional network to directly produce word or text-line level prediction. The produced predictions which could be rotated rectangles or quadrangles are further processed through the non-maximum-suppression step to yield the final output. },

3.1 Hyperparameters and Training For all datasets we use: rectified linear units, filter windows (h) of 3, 4, 5 with 100 feature maps each, dropout rate (p) of 0.5, l2 constraint (s) of 3, and mini-batch size of 50. These values were chosen via a grid search on the SST-2 dev set.

This dataset consists of 3000 images in different settings (indoor and outdoor) and lighting conditions (shadow, light and night),  with text in Korean and English. Some images also contain digits. To initialize a git repository in the root of the folder, run the git init command: Step 2: Add a new file Go ahead and add a new file to the project, using any text editor you like or running a touch command TensorFlow has a concept of a summaries, which allow you to keep track of and visualize various quantities during training and evaluation. For example, you probably want to keep track of how your loss and accuracy evolve over time. You can also keep track of more complex quantities, such as histograms of layer activations. Summaries are serialized objects, and they are written to disk using a SummaryWriter. How do I create new branch in GitHub ? How to use Git and GitHub Tags are a simple aspect of Git, they allow you to identify specific release versions of your code. You can think of a tag as a branch that doesn't change. Once it is created, it loses the ability to change the..

Using git as a beginner is like visiting a new country for someone who can't read/speak the local language. As soon as you know where you are and where to go, everything is fine, but the moment.. 1.2. Installation of Git support into Eclipse. 2. Exercise: Git user configuration for the Eclipse IDE. 2.1. Ensure user and email is configured for Git. 2.2. Configure Git to rebase during pull operations python ./code/train-model.py Step 8: Get Model StateThe model takes ~30 minutes to train. You will get an email once the model is trained. In the meanwhile you check the state of the model $ git config --global color.ui true $ git config --global core.editor emacs. The first of these will enable colored output in the terminal; the second tells git that you want to use emacs

Here, W is our filter matrix and h is the result of applying the nonlinearity to the convolution output. Each filter slides over the whole embedding, but varies in how many words it covers. "VALID" padding means that we slide the filter over our sentence without padding the edges, performing a narrow convolution that gives us an output of shape [1, sequence_length - filter_size + 1, 1, 1]. Performing max-pooling over the output of a specific filter size leaves us with a tensor of shape [batch_size, 1, 1, num_filters]. This is essentially a feature vector, where the last dimension corresponds to our features. Once we have all the pooled output tensors from each filter size we combine them into one long feature vector of shape [batch_size, num_filters_total]. Using -1 in tf.reshape tells TensorFlow to flatten the dimension when possible.The probability of keeping a neuron in the dropout layer is also an input to the network because we enable dropout only during training. We disable it when evaluating the model (more on that later). Text. Color. White Black Red Green Blue Yellow Magenta Cyan. Text Edge Style. None Raised Depressed Uniform Dropshadow watch -n 100 python ./code/model-state.py Step 9: Make PredictionOnce the model is trained. You can make predictions using the model

Let's now define a function for a single training step, evaluating the model on a batch of data and updating the model parameters.

As we know in the deep learning world, there is no one solution which works for all. We will be seeing multiple approaches to solve the task at hand and will work through one approach among them. This example demonstrates the use of Convolution1D for text classification. Gets to 0.89 test accuracy after 2 epochs

git clone https://github.com/NanoNets/nanonets-ocr-sample-python cd nanonets-ocr-sample-python sudo pip install requests sudo pip install tqdmStep 2: Get your free API KeyGet your free API Key from https://app.nanonets.com/#/keys

OCR ocr in the wild character recognition tesseract EAST OpenCV Deep Learning Text Recognition Deep Learning Based OCR for Text in the Wild by Rahul Agarwal 9 months ago 15 min read We live in times when any organisation or company to scale and to stay relevant has to change how they look at technology and adapt to the changing landscapes swiftly. We already know how Google has digitized books. Or how Google earth is using NLP to identify addresses. Or how it is possible to read text in digital documents like invoices, legal paperwork, etc.

In this post, I'll explain the architecture of Faster R-CNN, starting with a high level overview, and then go over the details for each of the components You can play around with the code and try running the model with various parameter configuration. Code and instructions are available on Github.Before we define the training procedure for our network we need to understand some basics about how TensorFlow uses Sessions and Graphs. If you're already familiar with these concepts feel free to skip this section. CNNs are generally used in computer vision, however It turns out that CNNs applied to certain NLP problems perform quite well. Let's briefly see what happens when we use CNN on text data

