|Graphics and Image Processing|
The Oscilloscope Creates the First Computer Graphics
In 1950, Ben F. Laposky, created the first computer graphic using an electronic (analog) machine, an Oscilloscope. His electronic oscilloscope imagery was produced by manipulating electronic beams that were displayed across the fluorescent face of the oscilloscope's cathode-ray tube and then recorded onto high speed film using special lenses. He later added tinted filters to imbue the photographs with striking colors in 1957. He would set up as many as 70 controls on up to 60 oscilloscopes to create his designs. The resulting mathematical curves were similar to the lissajous wave form and basic waves were manipulated to create elegantly rhythmic designs called "oscillons", in a process he described as "analogous to the production of music by an orchestra."
Oscillion photographs were often accompanied by electronic synth music that were made popular by Robert Moog, a contemporary of Laposky. Laposky's musical art was published at more than 200 exhibitions in the US and abroad before the emergence of computer graphics in the mid-1960s.
"Electronic Abstractions are a new kind of abstract art. They are beautiful design compositions formed by the combination of electrical wave forms as displayed on a cathode-ray oscilloscope. The exhibit consists of 50 photographs of these patterns . A wide variety of shapes and textures is included. The patterns all have an abstract quality, yet retain a geometrical precision . They are related to various mathematical curves, the intricate tracings of the geometric lathes and pendulum patterns, but show possibilities far beyond these sources of design." — Sanford Museum, Gallery notes for Electronic Abstractions, 1952
The Whirlwind, first demonstrated in 1951, was the first computer capable of displaying real time text and graphics, using a large oscilloscope screen. Development of the Whirlwind began in 1945 under the leadership of Jay Forrester at MIT, as part of the Navy’s Airplane Stability and Control Analyzer (ASCA) project. Whirlwind received positional data related to an aircraft from a radar station in Massachusetts. The Whirlwind programmers had created a series of data points, displayed on the screen, that represented the eastern coast of Massachusetts, and when data was received from radar, a symbol representing the aircraft was superimposed over the geographic drawing on the screen of a CRT.
In the early 50s, Robert Everett designed an input device, which was called a light gun or light pen, to give the operators a way of requesting identification information about the aircraft. When the light gun was pointed at the symbol for the plane on the screen, an event was sent to Whirlwind, which then sent text about the plane’s identification, speed and direction to also be displayed on the screen. The Whirlwind computer was ultimately adopted by the U.S. Air Force for use in its new SAGE (Semi-Automatic Ground Environment) air defense system, which became operational in 1958 with more advanced display capabilities. The oscilloscope was also used to create the world's first interactive video game, Tennis for Two. It created on an oscilloscope by William Higinbotham in 1958 and used to entertain guests at the Brookhaven National Laboratory by simulating a tennis game. It was not largely unknown outside of research or academic settings. Three years later, Steve Russel, a student at MIT, invented SpaceWars, on a PDP-1 and because it was the first game to get mainstream success, it is popularly referred to as the first video game, despite the fact that Tennis for Two came out first. The first CRT display was a converted oscilloscope used to play SpaceWar. The first trackball (and thus, the first mouse) was a SpaceWar control at MIT. It is said that Ken Thompson salvaged a PDP-1 and created a new operating system, now called UNIX, so that he could play SpaceWar.
The Rise of Graphics Processing Units (GPU)
A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. It is specifically designed to perform the complex mathematical and geometric calculations necessary to render graphics. In 1994, Sony coined the term GPU for its PlayStation to describe the Toshiba designed 32-bit chip used to handle graphics, control of frame buffer, and drawing of polygons and textures. In 1999, NVIDIA popularized the term GPU as an acronym for graphics processing unit.
The foundation for what we know as 3D graphics was laid in the latter half of the 1970s. The first Atari computers, the 8-bit Atari 400 and Atari 800, introduced special integrated circuits for the display and acceleration of 2D graphics. ANTIC processed 2D display instructions using direct memory access (DMA). Like most video co-processors, it could generate playfield graphics (background, title screens, scoring display), while the CTIA generated colors and moveable objects. CTIA was later replaced by GTIA (George's Television Interface Adapter). Jay Miner, who designed the ANTIC and CTIA, later led chip development for the Commodore Amiga. The Amiga was the first mass-produced computer equipped with a special 2D accelerator (called blitter).
In 1984 IBM introduced its first GPU called Professional Graphics Controller (PGC) or Professional Graphics Adapter (PGA). In essence, it was basically an expansion card that could accelerate 3D graphics as well as 2D graphics. It consisted of three separate boards that were connected together, and it had its own CPU along with dedicated RAM (an Intel 8088 CPU and 320KB RAM). The PGC supported resolutions of up to 640 x 480 pixels, with 256 colors simultaneously shown on the display and a refresh rate of 60 frames per second. Its price was $4,290 when it was first introduced. This specific GPU didn't manage to achieve notable commercial success, however, the PGC is still considered an important milestone in the history of GPUs.
In 1985 Amiga revolutionized the graphics market with its advanced design and circuitry. The specially designed boards that fully handled the creation and acceleration of graphics in Amiga not only relieved the CPU from this task, but also offered the home computer very high graphics capabilities. You could say the Commodore Amiga was one of the first commercial computers equipped with what is now considered a GPU. Later, he fifth generation gaming consoles, PlayStation and Nintendo 64, were both equipped with 3D GPUs. In 1999 Nvidia introduced the successor to the RIVA, TNT2. The GeForce 256 supported the Transform and Lighting (T&L) engine and took the burden off of the main CPU for the creation of complex graphics effects. The GeForce 256 was significantly faster than the previous generation, with the performance difference reaching 50 percent in most games. In addition, the GPU was the first to fully support Direct3D. The integration of the T&L engine in GPUs allowed Nvidia to enter the professional CAD market as well, with the professional Quadro GPU line.
The modern era of GPU began in 2007 with both Nvidia and ATI (since acquired by AMD) packing graphics cards with ever-more capabilities. The two companies took different tracks to general purpose computing GPU (GPGPU). In 2007, Nvidia released its CUDA development environment, the earliest widely adopted programming model for GPU computing. Two years later, OpenCL became widely supported. This framework allows for the development of code for both GPUs and CPUs with an emphasis on portability. Thus, GPUs became a more generalized computing device. In 2010, Nvidia collaborated with Audi. They used Tegra GPUs to power the cars’ dashboard and increase navigation and entertainment systems. These advancements in graphics cards in vehicles pushed self-driving technology.
The CPU (central processing unit) has often been called the brains of the PC. Increasingly, that brain is being enhanced by another part of the PC, the GPU. The GPU goes well beyond basic graphics controller functions, and is a programmable and powerful computational device in its own right. While the GPU’s advanced capabilities were originally used primarily for 3D game rendering, those capabilities are being harnessed more broadly to accelerate computational workloads in areas such as financial modeling, cutting-edge scientific research and oil and gas exploration.
|Artificial Intelligence, ML, and DL|
Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)
Artificial Intelligence (AI) is a broad term that includes both Machine Learning (ML) and Deep Learning (DL). AI involves any technique that enables computers to mimic human behavior. During the dawn of computing in 1950, Alan Turing proposed a thought experiment known as the Turing Test which was a method of inquiry in Artificial Intelligence (AI) to determine whether or not a computer was capable of thinking like a human being. He suggested that it was. If humans were able to use available information and reason to solve problems then why wouldn't a machine be able to do the same thing? In his paper entitled "Computing Machinery and Intelligence" in which he discusses building intelligent machines and how to test their intelligence. What stopped him from following through with his hypothesis was the technology of the the time had not caught up with his ideas. Before 1949, computers could not store commands, they could only execute them. The proof of concept for Turing's thesis was funded by Research and Development (RAND) corporation and initialized five years later with the program Logic Theorist by Allen Newell, Cliff Shaw, and Herbert A. Simon. This program was considered the first AI program as it was designed to mimic the problem solving skills of humans. The term "AI" was not coined till 1955 by John McCarthy. McCarthy who along with Marvin Minsky hosted the Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI) in 1956 where the Logic Theorist was presented during an open ended discussion with top researchers. Alan Turing, Marvin Minsky, John McCarthy, Allen Newell, and Herbert A. Simon are widely considered to be the "founding fathers" of AI.
Although, you'll sometimes hear the terms AI used interchangeably with Machine Learning and Deep Learning, they are not the same thing. Machine Learning is a subset of Artificial Intelligence, consisting of more advanced techniques and models to enable computers to figure out things from data and deliver AI applications. It has been described as the science of getting a computer to act without being explicitly programmed. An Artificial neural networks (ANN) is one of the main tools used in machine learning. It is a computational model that is based on the structure and functions of biological neural networks and it is intended to replicate the way humans learn. While the concepts such as deep learning are relatively new, the mathematical theory behind them dates back to 1943, even before Turing Test, and the work of Warren McCulloch and Walter Pitts. It was then that they created the first mathematical model of a neural network in their publication, "A Logical Calculus of Ideas Immanent in Nervous Activity", where they proposed a combinations of mathematics and algorithms aimed at mimicking human thought processes. To understand how this works, requires a vast over simplification, and not entirely accurate understanding of how a biological neuron works. This is ok because on a higher level, this is what more or less happens. A neuron receives input from dendrite(s), processes it (similar to a CPU) using its soma, and passes the output through its axon (similar in shape and function to a cable) to the synapse which is the point of connection to other neurons. McCulloch-Pitts neurons is still the standard, even as it has evolved past its original limitations.
One of those evolutions, that allowed the original neural network to "learn" was the concept of the perceptron, which was invented by Frank Rosenblatt in 1958 at the Cornell Aeronautical Laboratory and funded by the United States Office of Naval Research. The perceptron was intended to be a machine, rather than a program. It was first implementation as software for the IBM 704 but subsequently implemented in custom-built hardware as the "Mark 1 perceptron". Iwas designed for image recognition: it consisted of an array of 400 photocells, randomly connected to the "neurons". Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors.
Just as Machine Learning is a subset of AI, Deep Learning is a subset of Machine Learning. Deep learning is a subset of Machine Learning that makes the computation of multi-layer neural networks possible, delivering high accuracy in tasks such as speech recognition, language translation, object detection, and many other breakthroughs. Deep learning can automatically learn/extract/translate the features data sets such as images, videos, or text without introducing hand-coded code or rules.
|ML, DL, and Computer Vision|
Teaching Computers to "See" and "Understand"
Computer Vision is a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. It could broadly be considered a subset of AI and machine learning. The world we live in full of cameras and video. Nearly everyone has smartphones with a camera that they can use to take pictures and post them on Instagram, Facebook, or YouTube. YouTube may be the second largest search engine and with it, hundreds of hours of video are uploaded every minute, billions of videos are watched everyday. The Internet is comprised of text and images. While indexing and searching text are straightforward, to index and search images requires algorithms that know what images contain. For a long time, indexing images for search was dependent on the meta descriptions of the person that uploaded them. The goal of computer vision is to understand the content of digital images. This involves developing methods that attempt to reproduce the capability of human vision. You'll need to get machines to see and understand the content of digital images and extract a description for it.
Computer vision is the automated extraction of information from images. Information can mean anything from 3D models, camera position, object detection and recognition to grouping and searching image content. There's also the complexity inherent in the visual world.
Computer vision remains one of the most popular applications of artificial intelligence. Computer vision-based AI techniques includes image classification, object detection and object segmentation. It is used for everything from face recognition-based user authentication to inventory tracking in warehouses to vehicle detection on roads. Computer vision uses advanced neural networks and deep learning algorithms such as Convolutional Neural Networks (CNN), Single Shot Multibox Detector (SSD) and Generative Adversarial Networks (GAN). Applying these algorithms requires a thorough understanding of neural network architecture, advanced mathematics and image processing techniques.
Computer Vision and Machine Learning
Machine learning and computer vision are two fields that have become closely associated with one another. Machine learning has been effectively used in computer vision for acquisition, image processing, and object focusing. Computer vision can be broken down into something that involves a digital image or a video, a sensing device, an interpreting device, and the interpretation. Machine learning come into focus during the interpreting device and interpreting stages. Analysis of digital recordings is done using machine learning techniques.
For the average Machine Learning developer, CNN remains a complex branch of AI. Apart from the knowledge and understanding of algorithms, CNNs demand high end, expensive infrastructure for training the models, which is out of reach for most of the developers. Even after managing to train and evaluate the model, developers find model deployment as a challenge. Trained CNN models are often deployed in edge devices that don’t have the required resources to perform inferencing - the process of classification and detection of images at run-time. Innovation Edge devices are complemented by purpose-built AI chips that accelerate inferencing which come with their own software drivers and an interfacing layer. Microsoft and Qualcomm have partnered to simplify training and deploying computer vision-based AI models with their Vision AI Developer Kit. . Developers can use Microsoft’s cloud-based AI and IoT services on Azure to train models while deploying them on the smart camera edge device powered by a Qualcomm’s AI accelerator.
Recognition in Computer Vision
Recognition in computer vision involves object recognition, identification, and detection. Some of the specialized tasks of recognition include optical character recognition, image retrieval, and facial recognition.
- Object Recognition – most commonly applied to face detection and recognition, this involves finding and identifying objects in a digital image or video. This can be approached by computer vision through either machine learning or deep learning.
- Machine Learning Approach – When Object recognition in machine learning requires you to define features before classification. This is commonly done through scale-invariant feature transform (SIFT) where SIFT uses key points of objects and stores them in a database. When the image is categorized, SIFT checks key points of the image with matches with found in a database.
- Deep Learning Approach – this does not require features to be specifically defined. A common approach here is through convolutional neural networks. A convolutional neural network (CNN) is a type of a deep learning algorithm which takes in an input image, assigns importance (learnable weights and biases), to various aspects/objects in an image to differentiate from one another. It is inspired by the biological neural network in the brain. ImageNet, a visual database designed for object recognition, is the best example of this. Its performance is said to be close to that of humans.
Motion Analysis in Computer Vision
Motion Analysis in computer vision involves a digital video that is processed to produce information. Simple processing can detect motion of an object. More complex processing tracks an object over time and can determine the direction of the motion. It has applications in motion capture, sports, and gait analysis.
- Motion capture – (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. Markers are worn near joints to identify motion. It has applications in animation, sports, military, roboticscomputer vision, and gait analysis. Typically, visual appearance is not included and only the movements of the actors are recorded. Motion Capture was used in Star Wars by Andy Serkis for Supreme Leader Snoke and Lupita Nyong for Maz Kanata.
- Gait analysis – involves the systematic study of animal locomotion, more specifically the study of human motion, using the eye and the brain of observers, augmented by instrumentation for measuring body movements, body mechanics, and the activity of the muscles. A typical gait analysis laboratory has several cameras (video or infrared) placed around a walkway or a treadmill, which are linked to a computer. The subject wears markers at various reference points of the body and as they move, a computer calculates the trajectory of each marker in three dimensions. It can be applied in sports biomechanics.
Computer vision is used in sports to improve broadcast experience, athlete training, analysis and interpretation, and decision making. Video tracking and object recognition are ideal for tracking the movement of players. Motion analysis methods are also used to assist in motion tracking. Deep learning using convolutional neural networks is used to analyze the data. It is also used in autonomous vehicles such as a self-driving car. Cameras are placed on top of the car to provide 360 degrees field of vision for up to 250 meters of range. The cameras aid in lane finding, road curvature estimation, obstacle detection, traffic sign detection, and much more.