What is the difference between computer vision and image recognition?

Image recognition is one specific task within the broader field of computer vision. Computer vision encompasses any system that extracts meaning from visual data — this includes image classification, object detection, tracking moving objects, measuring distances, reading text, and reconstructing 3D scenes. Image recognition is the narrower task of assigning a label to an entire image.

How accurate are modern computer vision systems?

Accuracy depends heavily on the task, the quality of training data, and the conditions under which the system operates. On well-defined tasks with high-quality data, modern deep learning models routinely exceed human accuracy. For example, on the ImageNet benchmark, top models now achieve error rates below 2%, compared to an estimated human error rate of around 5%. In production systems under real-world conditions — variable lighting, occlusion, unusual angles — accuracy is typically lower and must be measured on a representative sample of real deployment data.

Is computer vision the same as CCTV or surveillance?

No. CCTV is a hardware system for capturing and recording video. Computer vision is the software layer that analyses what is captured. A standard CCTV system records footage that a human must review. A computer vision system processes that same footage automatically to detect objects, count people, identify anomalies, or generate alerts — without requiring human review of every frame.

Does a computer vision system need the internet to work?

Not necessarily. Many production systems run inference entirely on local hardware — an edge device or on-premise server — without sending data to the cloud. This is important for latency-sensitive applications like real-time traffic monitoring and for privacy-sensitive applications like workplace attendance. Cloud connectivity may be used for model updates and centralised reporting, but core inference can run offline.

How much data is needed to train a computer vision model?

It depends on the task and whether you are training from scratch or fine-tuning a pre-trained model. Fine-tuning a state-of-the-art pre-trained model (such as a YOLO variant for object detection) can produce usable results with as few as 500-2,000 labelled examples per class. Training from scratch for a novel task may require tens of thousands of labelled examples. Data augmentation techniques — flipping, cropping, adjusting brightness — can multiply the effective size of a small dataset.

What industries in Indonesia benefit most from computer vision?

Transportation and traffic management, manufacturing quality control, retail analytics (footfall counting, shelf monitoring), public security and access control, agriculture (crop disease detection via drone imagery), and government services (document verification, attendance management) are all active adopters of computer vision technology in Indonesia.

What is Computer Vision? Real-World Applications in Indonesia

Computer vision is the field of artificial intelligence that enables machines to extract structured information from images and video — essentially, it gives computers the ability to see and interpret the visual world. In Indonesia, computer vision systems are already running in government offices, on public roads, and inside factories.

What is Computer Vision?

Computer vision is a branch of artificial intelligence focused on enabling machines to interpret and understand visual information from the world — photographs, video streams, medical scans, satellite imagery, or any other form of image data. A computer vision system takes raw pixels as input and produces structured outputs: labels, bounding boxes, measurements, alerts, or decisions.

The field draws on classical image processing techniques (edge detection, morphological operations) combined with modern deep learning, particularly convolutional neural networks (CNNs) and vision transformers. It is one of the most mature and commercially deployed areas of AI, with decades of research now translating into production systems across many industries.

How Does Computer Vision Work?

At its core, a computer vision pipeline transforms raw visual data into machine-understandable representations through a series of processing stages.

Data ingestion captures frames from a camera, scanner, or image file and normalises them to a consistent format and resolution.

Feature extraction is where the model learns to identify the patterns in pixels that are meaningful for the task — edges, textures, shapes, spatial relationships. In modern deep learning systems, this stage is not hand-coded by engineers; the model learns which features matter during training by exposure to thousands or millions of labelled examples.

Inference applies the trained model to new, unseen images to produce a prediction: "this region contains a vehicle of type sedan travelling at an estimated 67 km/h" or "this employee badge number matches the face in frame 4,821."

Post-processing filters, aggregates, and formats model outputs into actionable information — generating an alert, updating a database record, or rendering an annotation on a display.

The training process that produces the model is separate from inference. Training is computationally expensive (typically run on GPUs in the cloud or on high-powered workstations) and happens once or periodically. Inference can be fast enough to run in real time on modest edge hardware, which is why computer vision can be deployed at traffic intersections or factory floors without a constant cloud connection.

What Computer Vision Applications Are Already Running in Indonesia?

1. Traffic Monitoring and Vehicle Detection

AIGLE, developed by PT Graha Teknologi Maju for deployment in East Java, is a traffic detection system that uses computer vision to identify and count vehicles, detect congestion, identify traffic violations, and generate real-time data for traffic management authorities. The system processes live video feeds from road cameras and produces structured event data without requiring human operators to monitor every feed continuously.

This kind of system is directly relevant to Indonesia's traffic density challenge. Greater Jakarta alone has approximately 23 million registered vehicles, and managing that volume without automated monitoring would require an impractical number of human operators. Computer vision makes large-scale traffic management feasible.

2. Face Recognition Attendance Systems

PT Graha Teknologi Maju's HR Management System uses facial recognition to automate employee attendance recording. Employees check in and out by standing in front of a camera; the system matches their face against enrolled records and logs the timestamp without any card, PIN, or manual process.

The practical benefit is not just convenience — it eliminates proxy attendance (one employee clocking in for another), reduces administrative overhead, and produces an accurate digital record that integrates directly with payroll and HR systems. The system is deployed in environments where managing attendance for hundreds of employees across multiple locations would otherwise require significant administrative effort.

You can read more about this project in the HR Management System portfolio.

3. Manufacturing Quality Control

Computer vision systems on production lines inspect products at speeds and consistency levels that human inspectors cannot match. A camera positioned above a conveyor belt can examine every unit for surface defects, dimensional accuracy, label placement, and seal integrity at rates exceeding 1,000 units per minute. Systems deployed in Indonesian food and beverage manufacturing have achieved defect detection rates above 99%, compared to the 92-95% typical of trained human inspectors working at speed.

4. Agricultural Crop Disease Detection

Drone-mounted and satellite-based computer vision is being applied to Indonesian plantation agriculture — particularly palm oil, rubber, and rice. Models trained on aerial imagery can identify early signs of disease, nutrient deficiency, and drought stress across thousands of hectares in a single flight, enabling targeted intervention before damage spreads. Several pilot programs are operating in Kalimantan and Sumatra, with commercial deployments growing.

5. Document Verification and OCR

Government and financial services are using computer vision for automated document processing — reading identity cards (KTP), extracting data from forms, verifying that submitted documents are genuine rather than altered. The Indonesian banking sector's KYC (Know Your Customer) processes have been substantially automated using OCR and document classification models, reducing manual review time per application from minutes to seconds.

What Do You Need to Build a Computer Vision System?

A production computer vision system requires four components:

Hardware. Cameras suitable for the environment (resolution, frame rate, weather resistance where applicable), an inference device (edge computer, embedded GPU, or cloud VM), and networking to move data where it needs to go. The specific hardware profile depends on whether real-time local inference is required or whether cloud inference latency is acceptable.

Labelled training data. Images annotated with the information the model needs to learn — bounding boxes around vehicles, segmentation masks around defective areas, identity labels on faces. Data labelling is often the most time-consuming and expensive part of a computer vision project.

Model development. Selecting an appropriate architecture (YOLO variants are common for real-time detection; U-Net variants for segmentation; transformer-based models for complex scene understanding), training on the labelled dataset, and evaluating performance against a held-out test set.

Integration and deployment. Connecting model outputs to the systems that act on them — dashboards, alert systems, databases, ERP. This integration layer is frequently underestimated but is critical to delivering business value.

Our computer vision services cover all four layers, from hardware specification through model training to production deployment.

When Does Your Business Need Computer Vision?

Consider computer vision when your business faces any of these conditions:

You are paying people to watch or review visual content at scale. If staff spend significant hours reviewing CCTV footage, inspecting products, or manually entering data from forms and documents, computer vision can automate or substantially assist these tasks.

Human inspection is a bottleneck in your process. If quality control or document verification is the slowest step in your pipeline, a vision system operating continuously at machine speed can remove that constraint.

Consistency matters more than throughput alone. Human inspectors fatigue. A computer vision system applies the same criteria to the 10,000th image as it did to the first.

You need data from visual sources that currently generates no structured data. Road cameras, factory cameras, and drone footage all generate vast amounts of information that is invisible to business intelligence systems unless a computer vision layer extracts it.

For organisations that have identified a visual inspection, monitoring, or analysis problem, the AIGLE traffic monitoring project is a useful reference point for the scope and timeline of a production deployment.