Data Trends: Six Ways Data Will Change Business in 2023 and Beyond

Posted on diciembre 28, 2022 by Kopius

By Kristina Scott

Data is big and getting bigger. We’ve tracked six major data-driven trends for the coming year.

Digital analytics data visualization, financial schedule, monitor screen in perspective

Data is one of the fastest-growing and most innovative opportunities today to shape the way we work and lead. IDC predicts that by 2024, the inability to perform data- and AI-driven strategy will negatively affect 75% of the world’s largest public companies. And by 2025, 50% of those companies will promote data-informed decision-making by embedding analytics in their enterprise software (up from 33% in 2022), boosting demand for more data solutions and data-savvy employees.

Here is how data trends will shift in 2023 and beyond:

1. Data Democratization Drives Data Culture

If you think data is only relevant to analysts with advanced knowledge of data science, we’ve got news for you. Data democratization is one of the most important trends in data. Gartner research forecasts that 80% of data-driven initiatives that are focused on business outcomes will become essential business functions by 2025.

Organizations are creating a data culture by attracting data-savvy talent and promoting data use and education for employees at all levels. To support data democratization, data must be exact, easily digestible, and accessible.

Research by McKinsey found that high-performing companies have a data leader in the C-suite and make data and self-service tools universally accessible to frontline employees.

2. Hyper-Automation and Real-Time Data Lower Costs

Real-time data and its automation will be the most valuable big data tools for businesses in the coming years. Gartner forecasts that by 2024, rapid hyper-automation will allow organizations to lower operational costs by 30%. And by 2025, the market for hyper-automation software will hit nearly $860 billion.

3. Artificial Intelligence and Machine Learning (AI & ML) Continue to Revolutionize Operations

The ability to implement AI and ML in operations will be a significant differentiator. Verta Insights found that industry leaders that outperform their peers financially, are more than 2x as likely to ship AI projects, products, or features, and have made AI/ML investments at a higher level than their peers.

AI and ML technologies will boost the Natural Language Processing (NLP) market. NLP enables machines to understand and communicate with us in spoken and written human languages. The NLP market size will grow from $15.7 billion in 2022 to $49.4 billion by 2027, according to research from MarketsandMarkets.

We have seen the wave of interest in OpenAI’s ChatGPT, a conversational language-generation software. This highly-scalable technology could revolutionize a range of use cases— from summarizing changes to legal documents to completely changing how we research information through dialogue-like interactions, says CNBC.

This can have implications in many industries. For example, the healthcare sector already employs AI for diagnosis and treatment recommendations, patient engagement, and administrative tasks.

4. Data Architecture Leads to Modernization

Data architecture accelerates digital transformation because it solves complex data problems through the automation of baseline data processes, increases data quality, and minimizes silos and manual errors. Companies modernize by leaning on data architecture to connect data across platforms and users. Companies will adopt new software, streamline operations, find better ways to use data, and discover new technological needs.

According to MuleSoft, organizations are ready to automate decision-making, dynamically improve data usage, and cut data management efforts by up to 70% by embedding real-time analytics in their data architecture.

5. Multi-Cloud Solutions Optimize Data Storage

Cloud use is accelerating. Companies will increasingly opt for a hybrid cloud, which combines the best aspects of private and public clouds.

Companies can access data collected by third-party cloud services, which reduces the need to build custom data collection and storage systems, which are often complex and expensive.

In the Flexera State of Cloud Report, 89% of respondents have a multi-cloud strategy, and 80% are taking a hybrid approach.

6. Enhanced Data Governance and Regulation Protect Users

Effective data governance will become the foundation for impactful and valuable data.

As more countries introduce laws to regulate the use of various types of data, data governance comes to the forefront of data practices. European GDPR, Canadian PIPEDA, and Chinese PIPL won’t be the last laws that are introduced to protect citizen data.

Gartner has predicted that by 2023, 65% of the world’s population will be covered by regulations like GDPR. In turn, users will be more likely to trust companies with their data if they know it is more regulated.

Valence works with clients to implement a governance framework, find sources of data and data risk, and activate the organization around this innovative approach to data and process governance, including education, training, and process development. Learn more.

What these data trends add up to

As we step into 2023, organizations that understand current data trends can harness data to become more innovative, strategic, and adaptable. Our team helps clients with data assessments, by designing and structuring data assets, and by building modern data management solutions. We strategically integrate data into client businesses, use machine learning and artificial intelligence to create proactive insights, and create data visualizations and dashboards to make data meaningful.

At Kopius, we’ve designed a program to JumpStart your customer, technology, and data success.

JumpStart Your Data Transformation

Our JumpStart program fast-tracks business results and platform solutions. Connect with us today to enhance your customer satisfaction through a data-driven approach, drive innovation through emerging technologies, and achieve competitive advantage. Add our brainpower to your operation by contacting our team to JumpStart your business.

Related Services:

Additional resources:

Retail Technology and Innovation – a Conversation with Michael Guzzetta

Posted on octubre 28, 2022 by Kopius

We recently spent some time with Michael Guzzetta, a seasoned retail technology and innovation executive and consultant who has worked with brands such as The Walt Disney Company, Microsoft, See’s Candies, and H-E-B.

Tell me about your background. What brought you to retail?

Like many people, I launched my retail career in high school when I worked in the men’s department at Robinson’s May. I also worked for The Warehouse (music retailer) and was a CSR at Blockbuster video – strangely, I still miss the satisfaction of organizing tapes on shelves.

I ignited my tech career in 2001 when I started working in payment processing and cloud-based tech, and then I returned to retail in 2009 when I joined Disney Store North America, one of the world’s strongest retail brands.

During my tenure at Disney, I had the privilege of working at the intersection of creative, marketing, and mobile/digital innovation. And this is where the innovation bug bit me and kicked off my decades-long work on omnichannel innovation projects. I seek opportunities to test and deploy in-store technology to simplify experiences for customers and employees, increase sales, and drive demand. Since jump-starting this journey at Disney Store, I’ve also helped See’s Candies, Microsoft, and H-E-B to advance their digital transformation through retail innovation.

What are some of the retail technologies that got you started?

I’ve seen it all! I’ve re-platformed eCommerce sites, deployed beacons and push notifications, deployed in-store traffic counting, worked on warehouse efficiency, automated and integrated buyer journeys and omnichannel programs, and more. I recently built a 20k SF innovation lab space to run proofs-of-concept to validate tech, test, and deployment in live environments. Smart checkout, supply chain, inventory management, eCommerce… you name it.

What are the biggest innovation challenges in retail today?

Some questions that keep certain retailers up at night are, “How can we simplify the shopping experience for customers and make it easier for them to check out?”, “How can we optimize our supply chain and inventory operations?”, “How can we improve accuracy for customers shopping online and reduce substitutions and shorts in fulfillment?” and “How can we make it easier and more efficient for personal shoppers to shop curbside and home delivery orders?” Not to mention, “What is the future of retail, and which technologies can help us stay competitive?”

I see potential in several trends to address those challenges, but my top three are:

Artificial Intelligence/Machine Learning – AI will continue to revolutionize retail. It’s permeated most of the technology we use today, whether it’s SAAS or hardware, like smart self-checkout. You can use AI, computer vision, and machine learning to identify products and immediately put them in your basket. AI is embedded in our everyday lives – it powers the smart assistants we use daily, monitors our social media activity, helps us book our travel, and runs self-driving cars, among dozens of other applications. And as a subset of AI, Machine Learning allows models to continue learning and improving, further advancing AI capabilities. I could go on but suffice it to say that the retailer that nails AI first wins.

Computer vision. Computer vision has a sizable opportunity to solve inventory issues, especially for grocery brands. Today, there’s a gap between online inventory and what’s on the shelf since the inventory system can’t keep pace with what’s stocked and on the shelves for personal shoppers, which is frustrating for customers who don’t expect substitutions or out-of-stock deliveries. With the advent of computer vision cameras, you can combine those differences and see what is on the shelf in real-time to inform what is available online accurately. Computer vision-supported inventory management will be vital to creating a truly omnichannel experience. Computer vision also enables smart shopping carts, self-checkout kiosks, loss prevention, and theft prevention. Not to mention Amazon’s use of CV cameras with their Just Walk Out tech in Amazon Go, Amazon Fresh, and specific Whole Foods locations. It has endless applications for retail and gives you the eyes online that you can’t get in stores today.

Robotics. In the last five years, robotics has taken a seismic leap, and a shift has happened, which you can see in massive, automated fulfillment centers like those operated by Amazon, Kroger, and Walmart. A brand can deliver groceries in a region without having a physical store, thanks to robotic fulfillment centers and distribution centers. It’s a game-changer. Robotics has many functions beyond fulfillment in retail, but this application truly stands out.

What is a missed opportunity that more retail brands should take advantage of?

Data. Data is huge, and its importance can’t be understated. It’s a big, missed opportunity for retailers today. Improving data management, governance, and sanitation is a massive opportunity for retailers that want to innovate.

Key opportunity areas around data in retail include customer experience (know your customer), understanding trends related to customer buying habits, and innovation. You can’t innovate at any speed with dirty data.

There’s a massive digital transformation revolution underway among retailers, and they are trying to innovate with data, but they have so much data that it can be overwhelming. They are trying to create data lakes, a single source of truth, and sometimes they can’t work because of disparate data networks. I believe that some of the more prominent retailers will have their data act together in a few years.

“Dirty data” results from companies being around for a long time, so they’ve accrued multiple data sets and cloud providers, and their data hasn’t been merged and cleaned. If you don’t have the right data, you are making decisions based on bad or old data, which could hurt you strategically or literally.

What do you wish more people understood about retail technology and innovation?

Technology will not replace people. In my experience, technology is meant to enhance the human experience, which includes employees. If technology simplifies the process so much that the employees become idle, they are typically trained to manage the technology or cross-trained to grow their careers. Technology isn’t replacing the human experience any time soon, although it is undoubtedly changing the existing work experience – ideally for the better, both for the employees and the bottom line.

Technology doesn’t always lower costs for retailers. Hardware innovation requires significant capital expenses when it’s deployed chain-wide. Amazon’s “Just Walk Out” is impressive technology, but the infrastructure, cloud computing costs, and computer vision cameras are insanely expensive. In 5 years, that may be different, but today it is a loss leader. It’s worth it for Amazon because they can get positive press, demonstrate innovation, and show industry leadership. But Amazon has not lowered its operating costs with “Just Walk Out.” This is just one example, but there are many out there.

Online shopping will not eliminate brick-and-mortar shopping. If the pandemic has taught us anything, online shopping is here to stay – and convenience is extremely attractive to consumers. But I think people will never stop going to stores because people love shopping. The experience you get by tangibly picking something up and engaging with employees in a store location will always be around, even with the advent of the Metaverse.

What are some brands that excite you right now because of how they use technology?

Amazon. What they have been doing with Just Walk Out technology, dash carts, smart shelves, and other IoT technology puts Amazon at the front of the innovation pack. Let’s not forget that they’ve led the way in same or next-day delivery by innovating with their automated fulfillment centers! They have the desire, the resources, and the talent to be the frontrunner for years to come.

Alibaba. This Chinese company is another retailer that uses technology in incredible ways. Their HEMA retail grocery stores are packed with innovation and technology. They have IoT sensors across the stores, electronic shelf labels, facial recognition cameras so you can check out with your face, and robotic kitchens where your order is made and delivered on conveyor belts. They also have conveyors throughout the store, so a personal shopper can shop by zone, then hook bags to be carried to the wareroom for sortation and delivery prep – it’s impressive.

Walmart and Kroger. Both brands’ use of automated fulfillment centers (AFCs) and drone technology (among many others) are pushing the boundaries of grocery retail today. Their AFCs cast a much wider net and have expanded their existing markets, so, for example, we may see Kroger trucks in neighborhoods that don’t have a store in sight.

Home Depot. They have a smart app with 3D augmented reality and robust in-store mapping/wayfinding. Their use of machine learning is also impressive. For example, it helps them better understand what type of projects a customer might be working on based on their browsing and shopping habits.

Sephora. They use beacon technology to bring people with the Sephora app into the store and engage them. They have smart mirrors that help customers pick the right makeup for their skin tone and provide tutorials. Customers can shop directly through smart mirrors or work with an in-store makeup artist.

What advice do you have for retailers that want to invest in technology innovation?

My first piece of advice is to include change management in the project planning from the start.

There are inherent challenges in retail innovation, often due to change management issues. When a company has been around for decades or even more than a century, they operate with well-known, trusted, and often outdated infrastructure. While that infrastructure can’t uphold the company for the next several decades or centuries, there can be a fear of significant change and a deeply rooted preference for existing systems. There can be a fear of job loss because of the misconception that technology will replace people in retail.

Bring those change-resistant people into the innovation process early and often and invite them to be part of the idea generation. Any technology solution needs to be designed with the user’s needs in mind, and this audience is a core user group. Think “lean startup” approach.

My second piece of advice is to devote enough resources to innovation and give the innovation team the power to make decisions. The innovation team should still operate with lean resources, focusing on minimum viable products and proofs of concept, so failures aren’t cost-prohibitive. The innovation team performs best when it has the autonomy to test, learn, and fail as they explore innovative solutions. Then, it reports its findings and recommendations to higher-ups to calibrate and pivot where needed.

In closing, I’d say the key to innovation success is embracing the notion of failure. Failure has value! Put another way; failure is the fast track to learning. Learning what not to do and what to try next can help a retail company to accelerate faster than the competition. Think MVP, stay lean, get validated feedback quickly, and iterate until you have a breakthrough. And always maintain a growth mindset – never stop learning and growing.

JumpStart Your Retail Innovation Success

Innovating technology is crucial, or your business will be left behind. Our expertise in technology and business helps our clients deliver tangible outcomes and accelerate growth. At Kopius, we’ve designed a program to JumpStart your customer, technology, and data success.

Kopius has an expert emerging tech team. We bring this expertise to your JumpStart program and help uncover innovative ideas and technologies supporting your business goals. We bring fresh perspectives while focusing on your current operations to ensure the greatest success.

Partner with Kopius and JumpStart your future success.

Additional resources:

3 Reasons Companies Advance Their Data Journey to Combat Economic Pressure

Posted on octubre 5, 2022 by Kopius

By Danny Vally

Have you updated your organization’s data journey lately? We are living in the Zettabyte Era, because the volume, velocity, and variety of data assets being managed by companies are big and getting bigger.

Data is getting more complicated and siloed. Today’s data is more complex than the data a typical business managed just twenty years ago. Even small companies deal with large data sets from disparate sources that can be complicated to process. Each data set may have its own unique structure, size, query language, and type.

The types of data are also changing quickly. What used to be managed in spreadsheets now demands automated systems, machine data, social network data, IoT data, customer data, and more.

There are real economic advantages for companies that take advantage of the data opportunity by investing in digital transformation (often starting by moving data to the cloud). Companies that take control of data outperform the competition:

40% more revenue per employee
50% higher average net income on revenue
$100M in additional operating income annually

Common data journey scenarios that motivate data-driven investments include:

Understand and predict customer behavior in real-time
Cut costs and free up resources with simplified data analysis
Explore new business models by finding new relationships in data
Eliminate surprise and unnecessary expenses
Gather and unify data to better understand your business

A data strategy is more than a single tool, dashboard, or report. A mature data strategy for any business includes a roadmap to plan the company’s data architecture, migration, integration, and management. Building in governance planning to ensure data security, integrity, access, quality, and protection will empower a business to scale.

That roadmap may also include incorporating artificial intelligence and machine learning, which unleashes predictive analytics, deep learning, and neural networks. While these once were understood to be tools available only to the world’s largest businesses, AI and ML are actually being deployed at even small and midsized businesses, with much success.

We work with organizations throughout their data journey by helping to establish where they are, where they want to go, and what they want to achieve.

A data journey usually starts by understanding data sources and organizing the data. Many organizations have multiple data sources, so creating a common data store is an important starting point. Once the data is organized, we can harness insights from the data using reporting and visualization, which enables a real-time understanding of key metrics. Ensuring data governance and trust in sharing data is another important step, which is often supported by security. Lastly, advanced data can use artificial intelligence and machine learning to look for data trends or predict behaviors and extract new insights. By understanding where your organization is in its data journey, you can begin to visualize its next step.

Contact Kopius to JumpStart Your Success Today

At Kopius, we’ve designed a program to JumpStart your customer, technology, and data success.

Tailored to your needs, our user-centric approach, tech smarts, and collaboration with your stakeholders equip teams with the skills and mindset needed to:

Identify unmet customer, employee, or business needs
Align on priorities
Rapidly prototype solutions
And, fast-forward success

Gather your best and brightest business-minded individuals and join our experts for a hands-on workshop that encourages innovation and drives new ideas.

Additional resources:

Digital Twins, Machine Learning, and IoT

Posted on diciembre 28, 2021 by Kopius

Digital twins are part of the Internet of Things (IoT) interconnected system. In 2021, Accenture positioned them as one of the top five strategic technology trends to watch.

As the name suggests, a digital twin is a virtual model designed to reflect a physical object. Companies like Chevron are using digital twins to predict maintenance issues faster, and Unilever used one on the Azure IoT platform to analyze and fine-tune factory operations such as temperatures and production cycle times.

With a digital twin, the object being studied is outfitted with sensors related to key areas of functionality to produce data about aspects of the physical object’s performance, such as energy output, temperature, and weather conditions. The data is relayed to a processing system and applied to the twin.

Once informed with this data, the digital twin can run simulations, study performance issues, and generate possible improvements, all while generating insights that can be applied to the physical object.

Sometimes digital twins include a rich immersive visual experience, but that’s not always the case. Sometimes they have a simple interface or no interface at all.

Digital Twins are part of the evolution of IoT within the digital transformation. They are used often today in commercial real estate and facilities planning, and as we think about the metaverse, digital twins take on increasing importance with virtual spaces. When you think about the implications of machine learning on digital twins and the IoT, the possibilities for real-time smart monitoring get very interesting.

Imagine a large corporate campus that has been turned into an enormous digital twin that expands to other campuses and physical locations. What if that digital twin uses machine learning to optimize things like traffic, utilities, and weather? How could a global company use digital twins to have a complete model of the physical world?

Here is our biggest tip for anyone considering digital twins as part of a project strategy:

We like to start by considering the existing tools. A robust set of tools already exists through companies like Microsoft and Amazon Web Services TwinMaker (both of which are Valence partners).

Leverage existing industry ontologies (data dictionaries) like schema and naming systems and data formats for interchange within communities. You’ll benefit from established best practices and from broader operability between third-party vendors.

Microsoft contributed industry standards for digital twin definition language that make it simpler to build, use, and maintain digital twins.

The underlying services are provisioned automatically so developers can build upon a platform of services and extend the existing Microsoft or Amazon product. The process isn’t turnkey, and you won’t be able to create a digital twin using completely out-of-the-box tools, but the platform is managed for you, which lowers the operation costs. The platforms are also more secure and designed with best-operating practices in mind such as automatic back-up and built-in deployment automation.

Building upon industry standards will also save you time. For example, if you want to create a smart building solution and need to describe a building’s physical space, industry standards will help since software developers don’t usually have a facilities or building management background. An industry-standard model gives developers an advantage when creating a digital twin that their clients can understand and use.

Data-driven solution

Digital twins create a platform to measure and store data. With the data available, you can test and answer both operational and business questions. For example, you can investigate fragile risky components in your supply/production system and explore opportunities to improve and expand new services. The key is that measuring and storing the data are essential steps before using any analytical tool.

Digital Twins are Evolving

While building a digital twin is more difficult than what can be done by a typical business user, we can develop these complex systems with a modest team of developers and designers. We typically only need to bring in highly specialized engineers when there are heavy integration and interoperability challenges with several vendors.

The technology is evolving, and early-stage challenges with vendor integration will improve over time, making it easier to transition a digital twin solution from one cloud provider to another.

One of the keys to digital transformation is challenging how we do things today to explore how to get more computerization and automation involved. Can digital twins improve your organization’s warehousing and distribution? Can digital twins improve the challenges faced in the supply chain? Can your sustainability goals be tested with a digital twin? There are many possibilities to consider!

Evolve with JumpStart

Partner with Kopius and JumpStart your future success.

Related Services:

Additional Resources:

Training the Machines: An Introduction To Types of Machine Learning

Posted on septiembre 29, 2021 by Kopius

by Yuri Brigance

I previously wrote about deep learning at the Edge . In this post I’m going to describe the process of setting up an end-to-end Machine Learning (ML) workflow for different types of machine learning.

There are three common types of machine learning training approaches, which we will review here:

Supervised
Unsupervised
Reinforcement

And since all learning approaches require some type of training data, I will also share three methods to build out your training dataset via:

Human Annotation
Machine Annotation
Synthesis / Simulation

Supervised Learning:

Supervised learning uses a labeled training set of both inputs and outputs to teach a model to yield the desired outcome. This approach typically relies on a loss function, which is used to evaluate training accuracy until the error has been sufficiently minimized.

This type of learning approach is arguably the most common, and in a way, it mimics how a teacher explains the subject matter to a student through examples and repetition.

One downside to supervised learning is that this approach requires large amounts of accurately labeled training data. This training data can be annotated manually (by humans), via machine annotation (annotated by other models or algorithms), or completely synthetic (ex: rendered images or simulated telemetry). Each approach has its pros and cons, and they can be combined as needed.

Unsupervised Learning:

Unlike supervised learning, where a teacher explains a concept or defines an object, unsupervised learning gives the machine the latitude to develop understanding on its own. Often with unsupervised learning, the machines can find trends and patterns that a person would otherwise miss. Frequently these correlations elude common human intuition and can be described as non-semantic. For this reason, the term “black box” is commonly applied to such models, such as the awe-inspiring GPT-3.

With unsupervised learning, we give data to the machine learning model that is unlabeled and unstructured. The computer then identifies clusters of similar data or patterns in the data. The computer might not find the same patterns or clusters that we expected, as it learns to recognize the clusters and patterns on its own. In many cases, being unrestricted by our preconceived notions can reveal unexpected results and opportunities.

Reinforcement Learning:

Reinforcement learning teaches a machine to act in a semi-supervised approach. The machines are rewarded for correct answers, and the machine wants to be rewarded as much as possible. Reinforcement learning is an efficient way to train a machine to learn a complicated task, such as playing video games or teaching a legged robot to walk.

The machine is motivated to be rewarded, but the machine doesn’t share the operator’s goals. So if the machine can find a way to “game the system” and get more reward at the cost of accuracy, it will greedily do so. Just as machines can find patterns that humans miss in unsupervised learning, machines can also find missed patterns in reinforcement learning, and exploit those invisible patterns to receive additional reinforcement. This is why your experiment needs to be airtight to minimize exploitation by the machines.

For example, an AI twitterbot that was trained with reinforcement learning was rewarded for maximizing engagement. The twitterbot learned that engagement was extremely high when it posted about Hitler.

This machine behavior isn’t always a problem – for example reinforcement learning helps machines find bugs in video games that can be exploited if they aren’t resolved.

Datasets:

Machine Learning implies that you have data to learn from. The quality and quantity of your training data has a lot to do with how well your algorithm can perform. A training dataset typically consists of samples, or observations. Each training sample can be an image, audio clip, text snippet, sequence of historical records, or any other type of structured data. Depending on which machine learning approach you take, each sample may also include annotations (correct outputs / solutions) that are used to teach the model and verify the results. Training datasets are commonly split into groups where the model only trains on a sub-set of all available data. This allows a portion of the dataset to be used for validation of the model, to ensure that the model has generalized enough data to perform well on data it has not seen before.

Regardless of which training approach you take, your model can be prone to bias which may be inadvertently introduced through unbalanced training data, or selection of the wrong inputs. One example is an AI criminal risk assessment tool used by courts to evaluate how likely a defendant is to reoffend based on their profile as input. Because the model was trained on historical data, which included years of disproportionate targeting by law enforcement of low-income and minority groups, the resulting model produced higher risk scores for low-income and minority individuals. It is important to remember that most machine learning models pick up on statistical correlations, and not necessarily causations.

Therefore, it is highly desirable to have a large and balanced training dataset for your algorithm, which is not always readily available or easy to obtain. This is a task which may initially be overlooked by businesses excited to apply machine learning to their use cases. Dataset acquisition is as important as the model architecture itself.

One way to ensure that the training dataset is balanced is through Design of Experiments (DOE) approach, where controlled experiments are planned and analyzed to evaluate the factors which control the value of an output parameter or group of parameters. DOE allows for multiple input factors to be manipulated, determining their effect on the model’s response. Thus, giving us the ability to exclude certain inputs which may lead to biased results, as well as gain a better understanding of the complex interactions that occur inside the model.

Here are three examples of how training data is collected, and in some cases generated:

Human Labeled Data:

What we refer to human labeled data is anything that has been annotated by a living human, either through crowdsourcing or by querying a database and organizing the dataset. An example of this could be annotating facial landmarks around the eyes, nose, and mouth. These annotations are pretty good, but in certain instances can be imprecise. For example, the definition of “the tip of the nose” can be interpreted differently by different humans who are tasked with labeling the dataset. Even simple tasks, like drawing a bounding box around apples in photos can have “noise” because the bounding box may have more or less padding, may be slightly off center, and so on.

Human labeled data is a great start if you have it. But hiring human annotators can be expensive and prone to error. Various services and tools exist, from AWS SageMaker GroundTruth to several startups which make the labeling job easier for the annotators, and also connect annotation vendors with clients.

It might be possible to find an existing dataset in the public domain. In an example with facial landmarks, we have WFLW, iBUG, and other publicly available datasets which are perfectly suitable for training. Many have licenses that allow commercial use. It’s a good idea to research whether someone has already produced a dataset that fits your needs, and it might be worth paying for a small dataset to bootstrap your learning process.

2. Machine Annotation:

In plain terms, machine annotation is where you take an existing algorithm or build a new algorithm to add annotations to your raw data automatically. It sounds like a chicken and egg situation, but it’s more feasible than it initially seems.

For example, you might already have a partially labeled dataset. Let’s imagine you are labeling flowers in bouquet photos, and you want to identify each flower. Maybe you had some portion of these images already annotated with tulips, sunflowers, and daffodils. But there are still images in the training dataset that contain tulips which have not been annotated, and new images keep coming in from your photographers.

So, what can you do? In this case, you can take all the existing images where the tulips have already been annotated and train a simple tulip-only detector model. Once this model reaches sufficient accuracy, you can fill in the remaining missing tulip annotations automatically. You can keep doing this for the other flowers. In fact, you can crowdsource humans to annotate just a small batch of images with a specific new flower, and that should be enough to build a dedicated detector that can machine-annotate your remaining samples. In this way, you save time and money by not having humans annotate every single image in your training set or every new raw image that comes in. The resulting dataset can be used to train a more complete production-grade detector, which can detect all the different types of flowers. Machine annotation also gives you the ability to continue improving your production model by continuously and automatically annotating new raw data as it arrives. This achieves a closed-loop continuous training and improvement cycle.

Another example is where you have incompatible annotations. For example, you might want to detect 3D positions of rectangular boxes from webcam images, but all you have are 2D landmarks for the visible box corners. How do you estimate and annotate the occluded corners of each box, let alone figure out their position in 3D space? Well, you can use a Principal Component Analysis (PCA) morphable model of a box and fit it to 2D landmarks, then de-project the detected 3D shape into 3D space using camera intrinsics . This gives you full 3D annotations, including the occluded corners. Now you can train a model that does not require PCA fitting.

In many cases you can put together a conventional deterministic algorithm to annotate your images. Sure, such algorithms might be too slow to run in real-time, but that’s not the point. The point is to label your raw data so you can train a model, which can be inferenced in milliseconds.

Machine annotation is an excellent choice to build up a huge training dataset quickly, especially if your data is already partially labeled. However, just like with human annotations, machine annotation can introduce errors and noise. Carefully consider which annotations should be thrown out based on a confidence metric or some human review, for example. Even if you include a few bad samples, the model will likely generalize successfully with a large enough training set, and bad samples can be filtered out over time.

3. Synthetic Data

With synthetic data, machines are trained on renderings or in hyper-realistic simulations – think of a video game of a city commute, for example. For Computer Vision applications, a lot of synthetic data is produced via rendering, whether you are rendering people, cars, entire scenes, or individual objects. Rendered 3D objects can be placed in a variety of simulated environments to approximate the desired use case. We’re not limited to renderings either, as it is possible to produce synthetic data for numeric simulations where the behavior of individual variables is well known. For example, modeling fluid dynamics or nuclear fusion is extremely computationally intensive, but the rules are well understood – they are the laws of physics. So, if we want to approximate fluid dynamics or plasma interactions quickly, we might first produce simulated data using classical computing, then feed this data into a machine learning model to speed up prediction via ML inference.

There are vast examples of commercial applications of synthetic data. For example, what if we needed to annotate the purchase receipts for a global retailer, starting with unprocessed scans of paper receipts? Without any existing metadata, we would need humans to manually review and annotate thousands of receipt images to assess buyer intentions and semantic meaning. With a synthetic data generator, we can parameterize the variations of a receipt and accurately render them to produce synthetic images with full annotations. If we find that our model is not performing well under a particular scenario, we can just render more samples as needed to fill in the gaps and re-train.

Another real-world example is in manufacturing where “pick-and-place” robots use computer vision on an assembly line to pack or arrange and assemble products and components. Synthetic data can be applied in this scenario because we can use the same 3D models that were used to create injection molds of the various components to make renderings as training samples that teach the machines. You can easily render thousands of variations of such objects being flipped and rotated, as well as simulate different lighting conditions. The synthetic annotations will always be 100% precise.

Aside from rendering, another approach is to use Generative Adversarial Network (GAN) generated imagery to create variation in the dataset. Training GAN models usually requires a decent number of raw samples. With a fully trained GAN autoencoder it is possible to explore the latent space and tweak parameters to create additional variation. Although it’s more complex than classical rendering engines, GANs are gaining steam and have their place in the synthetic data generation realm. Just look at these generated portraits of fake cats!

Choosing the right approach:

Machine learning is on the rise across industries and in businesses of all sizes. Depending on the type of data, the quantity, and how it is stored and structured, Valence can recommend a path forward which might use a combination of the data generation and training approaches outlined in this post. The order in which these approaches are applied varies by project, and boils down to roughly four phases:

Bootstrapping your training process. This includes gathering or generating initial training data and developing a model architecture and training approach. Some statistical analysis (DOE) may be involved to determine the best inputs to produce the desired outputs and predictions.
Building out the training infrastructure. Access to Graphics Processing Unit (GPU) compute in the cloud can be expensive. While some models can be trained on local hardware at the beginning of the project, long-term a scalable and serverless training infrastructure and proper ML experiment lifecycle management strategy is desirable.
Running experiments. In this phase we begin training the model, adjusting the dataset, experimenting with the model architecture and hyperparameters. We will collect lots of experiment metrics to gauge improvement.
Inference infrastructure. This includes integrating the trained model into your system and putting it to work. This can be cloud-based inference, in which case we’ll pick the best serverless approach that minimizes cloud expenses while maximizing throughput and stability. It might also be edge inference, in which case we may need to optimize the model to run on a low-powered edge CPU, GPU, TPU, VPU, FPGA, or a combination of thereof.

What I wish every reader understood is that these models are simple in their sophistication. There is a discovery process at the onset of every project where we identify the training data needs and which model architecture and training approach will get the desired result. It sounds relatively straight forward to unleash a neural network on a large amount of data, but there are many details to consider when setting up Machine Learning workflows. Just like real-world physical research, Machine Learning requires us to up a “digital lab” which contains the necessary tools and raw materials to investigate hypotheses and evaluate outcomes – which is why we call AI training runs “experiments”. Machine Learning has such an array of truly incredible applications that there is likely a place for it in your organization as part of your digital journey.

JumpStart Your Machine Learning Success

Kopius supports businesses seeking to govern and utilize AI and ML to build for the future. We’ve designed a program to JumpStart your customer, technology, and data success.

Tailored to your needs, our user-centric approach, tech smarts, and collaboration with your stakeholders, equip teams with the skills and mindset needed to:

Identify unmet customer, employee, or business needs
Align on priorities
Plan & define data strategy, quality, and governance for AI and ML
Rapidly prototype data & AI solutions
And, fast-forward success

Partner with Kopius and JumpStart your future success.

Related Services:

Additional resources:

Training Data Sets in Machine Learning Models

Posted on agosto 12, 2019 by Kopius

By Yuri Brigance

I have a particular interest in how to train data sets in machine learning models.

This year’s TC Robotics & AI conference had all the proof we need that consumer robotics, powered by the latest Machine Learning science, is quickly becoming a booming industry with lots of investor interest behind it. New Machine Learning (ML) architectures and training techniques are coming out almost every month. It was interesting to see how these algorithms are being used to create a new wave of consumer tech, as well as large numbers of service offerings springing up to make machine learning more user-friendly. How we train data sets in machine learning models is increasingly important.

superannotate.ai machine learning models

Training Data Sets

Training data sets in machine learning is one of the noticeable priorities in this new and growing ecosystem.

Machine Learning relies on A LOT of training data. Creating it is no easy feat. Much of it requires manual human effort to correctly label. A lot of companies have sprung up to help address this problem, and make data collection and labeling faster and easier, in some ways automating it completely.

Aside from labeling, collecting such training data can be just as difficult. Self-driving cars are a well-known example — we’ve all heard of, and maybe even seen autonomous vehicles being tested on public roads. However, it might come as a surprise that most of those driving miles aren’t used for training data collection.

As Sterling Anderson of Aurora and Raquel Urtasun of Uber explained, most self-driving technologies are actually trained in simulation. The autonomous fleets are out testing the trained models in the real world. On occasion the system will disengage and flag a new scenario. The disengagement condition is then permuted thousands of times and becomes part of the simulation, providing millions of virtual miles for training purposes. It’s cost efficient, scalable, and very effective.

Creating such simulations is not trivial. In order to provide the right fidelity, not only must the virtual world must look visually hyper-realistic, but all the sensor data (lidar, radar, and a hundred others) must also be perfectly synced to the virtual environment. Think flight simulator, but with much better graphics. In many cases, sensor failures can be simulated as well, and self-driving systems need to be able to cope with the sudden loss of input data.

Visual data is notoriously difficult to label. Simulation aside, imagine if you are tasked with outlining all the cars, humans, cats, dogs, lamp posts, trees, road markings, and signs in a single image. And there are tens of thousands of images to go through. This is where companies like SuperAnnotate and ScaleAI come in.

SuperAnnotate provides a tool that combines superpixel-based segmentation with humans in the loop to allow for rapid creation of semantic segmentation masks. Imagine a drone orthomosaic taken over a forest with a variety of tree species — tools like this allow a human to quickly create outlines around the trees belonging to a specific category simply by clicking on them.

SuperAnnotate’s approach is interesting, but it likely won’t be sufficient for all scenarios. It’s useful for situations where you have well defined contrasting edges around the objects you are attempting to segment out, but it would likely not work so well for less defined separation lines. A good example is when you may want to figure out where the upper lip ends and the upper gum begins in portraits of smiling people. This will likely require a custom labeling tool — something we at Kopius have created on a number of occasions.

ScaleAI takes a different approach, and relies on a combination of statistical tools, machine learning checks, and most importantly, humans. This is a very interesting concept — effectively a Mechanical Turk for data labeling.

So it is quickly becoming apparent that data collection and training are whole separate pillars of the ML-powered industry. One might imagine a future where the new “manual labor” is labeling or collecting data. This is a fascinating field to watch, as it provides us with a glimpse of the kinds of new jobs available for folks who are now under threat of unemployment via automation. With one caveat — these systems are distributed, so even if you get a gig as a human data labeler, you may be competing with folks from all over the world, which has immediate income implications.

On the other hand, setting up simulations and figuring out ways to collect “difficult” data may be an entire engineering vertical on its own. As a current video game, AR/VR, or a general 3D artist/developer, you might find your skills very applicable in the AI/ML world. A friend of mine recently found an app that allows you to calculate your Mahjong score by taking a photo of your tiles. How would you train a model to recognize these tiles from a photo, in various lighting conditions and from all angles? You could painstakingly take photos of the tiles and try to label them yourself, or you could hire a 3D artist to 3D model the tiles. Once you have realistic 3D models, you can spin up a number of EC2 instances running Blender (effectively a “render farm” in the cloud). Using Python, you can then programmatically script various scenes (angles, lights, etc.) and use Blender’s ray-tracing engine to crank out thousands of pre-labeled 3D renders of simulated tiles in all sorts of positions, angles, colors, etc.

But what if your task is to detect weather conditions (wind, rain, hail, thunder, snow) via a small IoT device with just a cheap microphone as a sensor. Where do you get all the training sounds to create your model? Scraping YouTube for sound can only get you so far — after all, those sounds are recorded with different microphones, background noises, and varying conditions. In this case, you may opt to create physical devices designed specifically for this kind of data collection. These may be expensive but might contain the required set of sensors to accurately record and label the sound you’re looking for, using the microphones you’ll use in production. Once the data is collected, you can train a model and run inference on a cheap edge device. Coming up with such data collection techniques can be an engineering field of its own, and execution requires manual labor to deploy these techniques in the field. It’s an interesting engineering problem, one that will undoubtedly give birth to a number of specialized service and consulting startups.

Here at Kopius we have the necessary talent to collect the data you need, either via crowdsourcing, simulating (we do AR/VR in-house and have talented 3D artists), using existing labeling tools, building custom labeling tools, or constructing physical devices to collect field data. We’re able to set up the necessary infrastructure to continuously re-train your model in the cloud and automatically deploy it to production, providing a closed-loop cycle of continuous improvement.

JumpStart Your Success Today

Kopius supports businesses seeking to govern and utilize AI and ML to build for the future. We’ve designed a program to JumpStart your customer, technology, and data success.

Tailored to your needs, our user-centric approach, tech smarts, and collaboration with your stakeholders, equip teams with the skills and mindset needed to:

Identify unmet customer, employee, or business needs
Align on priorities
Plan & define data strategy, quality, and governance for AI and ML
Rapidly prototype data & AI solutions
And, fast-forward success

Partner with Kopius and JumpStart your future success.

Additional Resources

Artificial Intelligence: How Smart Is It?

Posted on mayo 3, 2018 by Kopius

How Smart is AI?

So are computers ready to take over the world and subjugate the human race, given our inferior intelligence and processing power? How smart is artificial intelligence?

That’s been a major Hollywood theme for decades. Who doesn’t remember the chilling lines in 2001: A Space Odyssey, “I’m sorry, Dave. I’m afraid I can’t do that.” when it suddenly becomes clear that the supercomputer HAL has gone rogue.

Or the ominous scene in Bladerunner, when escaped “replicant” Leon murders the police officer administering a diagnostic test of his humanity (or, in this case, lack of humanity).

Although there are real concerns about setting AI free in the world, much of the media-hyped fear about the coming AI apocalypse is overblown. And even if there are valid technological and ethical considerations, the technology is still a long way off from that point.

Here’s how Andrew Ng, the chief scientist at Baidu from 2014 to 2017, put it in an interview with Vox earlier this year: “Worrying about evil-killer AI today is like worrying about overpopulation on the planet Mars. Perhaps it’ll be a problem someday, but we haven’t even landed on the planet yet.” (He does believe we should be thinking about how AI will displace the workforce of tomorrow, though.)

The Power of Artificial Intelligence and Machine Learning

The reality is that artificial intelligence and machine learning (let’s add some acronyms: AI and ML), are incredibly powerful technologies. They are able to find patterns in mind-boggling quantities of data orders of magnitude faster than humans. Plus they can learn to recognize objects and predict outcomes, and they get better at that over time. So while they are not likely to turn into evil killers of humanity, they are likely to transform, well, everything.

They will absolutely change the way we interact with the world — through Natural Language Processing (Hey, Siri, take me home.) and computer vision systems. Soon enough we will be able to initiate voice commands like “OK, Google, take me to the mountain in this photo.” Already Facebook can tag you practicing that embarrassing dance move at your best friend’s bachelor party. Or giving the keynote at an industry convention, for that matter.

AI Applications Across Industries

AI and ML will automate many of the boring or repetitive tasks that people perform now, which will transform the future workforce. Think of virtual assistants who schedule meetings and send automatic follow-up messages or appointment reminders.

We already have giant industrial robots that manufacture cars, and they are only getting smarter — like knowing how to avoid injuring people or even scheduling their own tune-ups so they don’t break down and cause expensive, disruptive work stoppages.

AI in the Automotive Industry

Autonomous vehicles rely on many of these AI systems strung together: a series of sensors — including video cameras, LIDAR, sonar, and motion sensors — detect the environment and feed that data to the car’s processing systems, which then analyze and act in real-time. The technology is marching forward at breakneck speed, with VC investments and high-profile acquisitions constantly making the news.

Although fully autonomous vehicles are many years off, multiple features that use AI tech are completely functional in cars on the road right now. These include adaptive cruise control, automatic emergency braking, lane departure warning, lane keeping assist, and front collision warning systems, to name a few.

AI Applications in Healthcare

Healthcare is also an area of incredible promise when it comes to AI and machine learning. IBM’s Watson Health mines health data to find patterns that no human mind would be powerful enough to recognize. This will help speed drug discovery, detect insurance fraud, and create personalized plans to keep people healthy, among other innovations.

Accessible Cloud-Based AI Innovations

It all sounded so sci-fi only a few decades ago. But now AI is upon us, and the speed of discovery is accelerating. That’s in part thanks to the availability of cloud-based AI services like Amazon’s Lex and Rekognition, which enable you to add voice and video recognition into your own systems. Or Microsoft’s AI services, which let you add analytics, speech recognition, and machine learning. Or the ubiquitous Google Translate, which can translate text, entire web pages, and even the writing on the outside of packaged goods in multiple languages all over the world. What used to be open only to Google, Facebook, and world superpowers is now accessible to everyone.

JumpStart AI for Your Business with Kopius

Kopius supports businesses seeking to govern and utilize AI and ML to build for the future. We’ve designed a program to JumpStart your customer, technology, and data success.

Tailored to your needs, our user-centric approach, tech smarts, and collaboration with your stakeholders, equip teams with the skills and mindset needed to:

Identify unmet customer, employee, or business needs
Align on priorities
Plan & define data strategy, quality, and governance for AI and ML
Rapidly prototype data & AI solutions
And, fast-forward success

Partner with Kopius and JumpStart your future success.

Related Services:

Additional Resources:

Chatbots: Much More Than A Novelty

Posted on diciembre 1, 2017 by Kopius

Chatbots: Much More Than A Novelty

The promise of Artificial Intelligence and chatbots is here.

Sure, humanoid robots s aren’t yet roaming the earth, but AI-induced applications and AI-infused services are transforming the world around us into a more intelligent, interactive, and empowered domain. Looking for a good example? Ask Siri, Alexa, Cortana, or CleverBot. They, collectively, are the answer.

Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, and Google’s Cleverbot are all examples of chatbots — “a computer program which conducts a conversation via auditory or textual methods.” Some chatbots use natural language processing ability to understand your speech and then respond verbally. Apple’s Siri is perhaps the most famous example of this type of chatbot, though Alexa and Cortana are also widely used. Other chatbots are text-based, responding to typed questions, commands, or observations. Microsoft’s Xiaoice, for example, was released in China in 2014 and, as of only a year later, had already been used by over 40 million smartphone owners (25% of whom had reportedly said “I love you” to their “virtual friend,” which is available on China’s two most prominent social media platforms — Weibo and WeChat).

Chatbots have been the subject of controversy — see Microsoft’s Tay — and frequent comic derision — see, e.g. Siri. More generally, many people see them as little more than a novelty — a fun way for consumers to interact with technology. But they are much much more than that. Simply put, chatbots are a powerful example of the proliferation of Artificial Intelligence into mainstream society. And we are just scratching the surface of their capabilities.

To-date, the landscape of chatbots available for consumers and enterprises has been dominated largely by the tech titans mentioned above. It is in the process, though, of getting significantly more diverse and dynamic, a phenomenon driven by the release of numerous chatbot frameworks for developers.

Chatbot frameworks are essentially software development kits (SDKs) for the AI-verse. They provide a platform — the technology infrastructure — for developers to build chatbots in a manner which meets their needs. The release of frameworks like Microsoft’s Bot Framework and Facebook’s Bot Engine (wit.ai) means that any developer, be they a hobbyist or professional service provider, can build a chatbot to improve their life or the lives of those around them.

Want to build a chatbot that speaks to you in Captain Hook lingo in time for the annual Talk Like a Pirate Day (September 19)? Have at it! Think your business can benefit from a chatbot designed to provide a more intuitive way to access and organize the data that fuels your success? Build it!

…or let us build it! Valence understands that chatbots are more than a novelty; they are a paradigm shifting technology that can digitally transform businesses in any sector. That’s why we’re putting them to work for our clients in ways that support both their strategic objectives and their day-to-day tactics. And that’s why we’re looking forward to learning how we can put them to work for you.

Learn How to JumpStart AI For Your Business

Kopius supports businesses seeking to govern and utilize AI and ML to build for the future. We’ve designed a program to JumpStart your customer, technology, and data success.

Tailored to your needs, our user-centric approach, tech smarts, and collaboration with your stakeholders, equip teams with the skills and mindset needed to:

Identify unmet customer, employee, or business needs
Align on priorities
Plan & define data strategy, quality, and governance for AI and ML
Rapidly prototype data & AI solutions
And, fast-forward success

Partner with Kopius and JumpStart your future success.

Additional Resources