Open Access Open Access  Restricted Access Subscription or Fee Access

A Report on Object Detection and Caption Generator

Amar Anand, Himanshu Kumar Srivastava

Abstract


Image Caption Generator is a application that generates captions for images. The image's semantic information is recorded and transformed into plain language. The capture mechanism is a timeconsuming effort that combines picture processing and computer vision. The mechanism must be capable of detecting and establishing links between items, humans, and animals. The goal of this article is to use deep learning to detect, recognize, and generate meaningful captions for a given image. The Regional Object Detector (RODe) is used to detect objects, recognize them, and provide captions. Deep learning is used in the suggested method to improve on the existing image caption generating system. Experiments are carried out on the Flickr 8k dataset using the Python programming language to demonstrate the proposed strategy.


Full Text:

PDF

References


L. Fei-Fei, A. Iyer, C. Koch, P. Perona., What do we perceive in a glance of a real-world scene? J. Vis. 7 (1) (2007) 1–29.

A. Kojima, T. Tamura, K. Fukunaga, Natural language description of human ac-tivities from video images based on concept hierarchy of actions, Int. Comput. Vis. 50 (2002) 171–184.

P. Hede, P. Moellic, J. Bourgeoys, M. Joint, C. Thomas, Automatic generation of natural language descriptions for images, in: Proceedings of the Recherche Dinformation Assistee Par Ordinateur, 2004.

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, DeCAF: a deep convolutional activation feature for generic visual recognition, in: Proceedings of The Thirty First International Conference on Machine Learning, 2014, pp. 647–655.

A. Farhadi, M. Hejrati, M.A. Sadeghi, P. Young, C. Rashtchian, J. Hocken-maier, D. Forsyth, Every picture tells a story: Generating sentences from images, in: Proceedings of the European Conference on Computer Vision„2010, pp. 15–29.

M. Hodosh, P. Young, J. Hockenmaier, Framing image description as a rank-ing task: data, models and evaluation metrics, J. Artif. Intell. Res. 47 (2013) 853–899.

Y. Yang, C.L. Teo, H. Daume, Y. Aloimono, Corpus-guided sentence generation of natural images, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, pp. 4 4 4–454.

R. Socher, A. Karpathy, Q.V. Le, C.D. Manning, A.Y. Ng, Grounded composi-tional semantics for finding and describing images with sentences, TACL 2 (2014) 207–218.

O. Vinyals, A. Toshev, S. Bengio, D. Erhan. Show and tell: a neural image cap-tion generator, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156–3164.

Q. You, H. Jin, Z. Wang, C. Fang, J. Luo, Image captioning with semantic atten-tion, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4651–4659.

M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artificial Intelligence Research, Volume 47, page




DOI: https://doi.org/10.37628/ijippr.v7i2.745

Refbacks

  • There are currently no refbacks.