Data Lakes: The Foundation of Big Data Analytics


By Kristina Scott

Data lakes are flexible and scalable architectures that are changing how businesses store and process data.

Businesses are generating and using more data than ever. This data comes from a variety of sources, such as customer interactions, social media, and IoT devices, among others. And it is often stored in different formats, making it challenging to use the data, analyze it, and gain insights. That’s where data lakes come in.

Data lakes are new to the world of big data analytics, and they are rapidly becoming the right choice for organizations. According to a report by MarketsandMarkets, the data lakes market is expected to grow from $7.9 billion in 2019 to $20.1 billion by 2024, at a compound annual growth rate of 20.6%.

Let’s dive deeper into the purpose of data lakes, explore their benefits, and look into their future.

What is a Data Lake?

Data lakes were introduced in the early 2000s by Apache Hadoop as an alternative to the limitations of data warehouses. A data lake is a storage system that allows you to store vast amounts of unstructured, semi-structured, and structured data at a low cost.

Simply put, a data lake is a large repository that stores raw data in its native format. Compared to a data warehouse, which stores data in hierarchical files or folders, a data lake uses a flat architecture and object storage.‍ While traditional data warehouses provide businesses with analytics, they are expensive, rigid, and often not equipped for the use cases companies have today, which is why the demand for data lakes is increasing.

Data lakes consolidate data in a central location where it can be stored as is, without the need to implement any formal structure for how the data is organized. That eliminates the need for preprocessing or transformation of data before storing it, making it an ideal storage solution for a vast amount of data. This raw data can then be processed and analyzed using a range of tools and technologies, such as machine learning algorithms, data visualization, and statistical analysis. Data lakes are built on Hadoop Distributed File System (HDFS) or cloud storage, such as Amazon S3, Microsoft Azure, or Google Cloud Storage.

Why Do Data Lakes Matter?

Often, a business has had big data and just didn’t know it. For instance, data goes unused because current business requirements only use a subset of the data a client or partner exchanges. Data lakes allow a business to consume and ingest vast amounts of raw data, allowing for data discovery in a cheap, efficient, and measurable way. Data-driven businesses tend to focus on future business needs, which require new insights into existing data and using newer technologies such as machine learning for predictive analysis.

Further, data lakes enable organizations to democratize big data access, making data-driven decisions a reality. The most significant advantage of data lakes is that they allow organizations to analyze data more effectively and gain insights faster to empower decision-making.  

Data lakes enable businesses to become more data-driven, as they can access and analyze big data quickly and efficiently, shifting the culture to embrace data-driven thinking across the organization. And it pays off — a Deloitte survey found that companies with the strongest culture around data-driven insights and decision-making were twice as likely to significantly exceed business goals. Data lakes enable that big data-driven culture to thrive and be accessible at all levels of the organization. BCC research also found that companies that use data lake services outperform similar companies by 9% in organic revenue growth.

How are Companies Using Data Lakes?

“The one thing I wish more people knew about data lakes is that it’s a tool that has great potential but can be misused. It’s vital to have a strategy to keep your data organized and avoid turning your lake into a swamp.”

Michael Rounds, Director of Data Engineering and Analysis, Kopius

Companies across a variety of industries are using data lakes to gain insights, improve operations and gain a competitive edge. In a research survey by TDWI, 64% of organizations said that the main purpose and benefit of a unified data lake is being able to get more operations and analytics business value from data. Other top value adds include reducing silos, gaining a better foundation for analytics compared to traditional data types, and storage and cost savings benefits.

Here are some practical use-case examples of organizations implementing data lakes in business operations:

  • Retailers use data lakes to analyze customer behavior and purchase history to offer personalized recommendations and promotions.
  • Healthcare organizations leverage data lakes to store patient data from multiple sources, such as electronic health records and wearables, to better diagnose and treat diseases.
  • Manufacturers implement data lakes to monitor and optimize production processes and analyze product performance, thus reducing operational costs.
  • Financial institutions use data lakes to gain deeper insights into customers’ behaviors, analyze and detect fraudulent activities, improve risk management, and improve customer experience.

Overall, data lakes help companies make more informed decisions. By storing all their data in one central location, companies can find patterns and trends that were previously hidden. They are empowered to democratize data access, becoming more data-driven, agile, and competitive.

What is the Future of Data Lakes?

The future of data lakes is bright, as businesses continue to invest in big data analytics to stay ahead of the competition. With the increasing dominance of technologies such as artificial intelligence (AI) and machine learning (ML), data lakes can become more intelligent and powerful, able to create predictive models and automate decision-making processes.

McKinsey suggests that businesses take full advantage of data lake technology and its ability to handle computing-intensive functions, like advanced analytics or machine learning. Organizations may want to build data-centric applications on top of the data lake that can seamlessly combine insights gained from both data lake resources and other applications. Data lakes can be used to develop new business models and revenue streams, as businesses seek ways to monetize their data assets.

Ready to harness the power of data lakes in your business? Kopius can help build the future of your data-driven organization by streamlining your data architecture and delivering powerful analytics through data governance, machine learning, data visualization, and more. Learn about our Data Lakes solutions.

Additional Resources


How to Avoid Context Switching and Be a Happier Software Engineer


Being part of Kopius offers opportunities for professional and academic growth.

Our Kopius Academy recently offered a session about context switching for software engineers and developers, which was led by our Director of Software Engineering, Yuri Brigance. This post summarizes Yuri’s key messages about context switching and how to reduce distractions when engineers need to focus.

Engineering is creative work – it requires education, skills, talent, and experience. Further, engineering activities have long durations and require research and preparation before executing a task.

context switching

“One of the things I’ve been trying to improve is how to allow myself to do focus work – deep work like writing code or architecture – while managing other responsibilities like pre-sales, having one-on-ones, etc.,” said Brigance.

Context switching in engineering is the process of switching from one program to another since people can focus on only one task at a time (multi-tasking is just rapid context switching).

There are two types of tasks: simple and demanding, and every task has two parts:

  1. Load the task. If you were a computer, that might mean opening and loading a program. But you are a human, so it is about remembering the context, such as where you left off, notes, etc.
  2. Save progress. When you are ready to stop working, you must save your context. The best way to end a task is to complete it, but if you must pause it, there is that “save” period where you have to put away what you were doing.

Complex task switching duration varies based on the complexity of the task.

You don’t have control over how long it takes to save or load your context, so if you are interrupted, it will take longer to load. “Being interrupted is like having your program crash – it will take longer to get back into your workflow. If you are interrupted, then you will spend more time getting back into work the next time you start that task,” says Brigance.

Productivity tends to increase with time. Once you have built up your mental context, the longer you work without interruptions, the more productive you become. This is why it’s helpful to concentrate meetings in the first half of the day so that once the meetings are done, engineers can focus on code with fewer interruptions or distractions.

This has the added benefit of ending the day having made significant focused progress, which can help engineers to feel happier and less stressed.

There are two types of interruptions:

  • Simple interruption: A simple interruption doesn’t require you to prepare for it, such as a notification on Slack. You experience the interruption, and then rebuild your mental context – but you don’t need to save your work prior to this interruption. You can reduce simple interruptions by managing your status on Slack/Teams, turning off notifications, silencing your phone, etc. It’s helpful to batch simple tasks and do them all in one single chunk of time. You can also reduce the simple interruptions experienced by others by being conscious of people’s work and avoiding interrupting them if they are busy.
  • Complex interruptions: Complex interruptions are tasks that require preparation, such as meetings. You can reduce the impact of complex interruptions by scheduling meetings that are too close together so 30-60-minute breaks between meetings don’t result in unfocused time.

Here are additional tips to help you avoid the fatigue of context switching and enjoy more and better deep work:

  • If you have multiple meetings, consolidate them and schedule them as early in the day as possible to give people as much afternoon time as possible to focus.
  • Offer a no-meeting day so there is a day each week when large complex chunks of work get done with minimal interruptions.  
  • If your job has a lot of client interaction and research, breaking up the week by types of contexts can be helpful.
  • Eliminate unnecessary meetings.
  • Avoid 30–90-minute breaks between meetings.
  • Set aside small tasks for intra-meeting breaks.
  • Decline low-priority meetings. Every meeting is important to someone, but if you are under a time crunch, you might need to skip a meeting.
  • Communicate with others when you need focus time.
  • Disable non-important push notifications.
  • Set aside time for breaks between tasks to reflect and plan.

Engineering and development work takes creativity and focused energy. Understanding the impact of context switching, and then finding ways to reduce its impact can result in happier, healthier, more productive engineers and better software solutions.

What is Kopius Academy?

Our company offers an educational program called Kopius Academy, where we provide educational sessions relevant to our careers in digital transformation. Whether they are led by our own team members or outside experts, we bring educational content to our teams every month that ranges from the highly technical to sessions about work/life balance and wellness.

Want to learn more from Kopius and our Kopius Academy? We regularly host educational events, webinars, and hackathons for the public, which can be found on our Careers page on LinkedIn, Instagram, and Twitter!

Additional Resources


Valence Group and MajorKey’s Latin American Division Combine to Create Kopius, a Nearshore Technology Powerhouse


Rebrand to Kopius Launches New Full Service Nearshore Digital Transformation Partner to Commercial and Public Sector Clients

Seattle, March 20, 2023 – Today marks the launch of Kopius, a nearshore digital solutions business co-located in Seattle and Buenos Aires. Kopius has been formed through the combination of Valence, a Seattle-based digital consulting firm, and the Latin American division of MajorKey, a Chicago-based technology services business. Both companies are part of The Acacia Group, specialist investors in digital transformation companies. Acacia is backing the formation of Kopius to service rapidly growing client demand for the exceptional nearshore digital consulting and delivery teams the combined business offers. MajorKey’s U.S. business will remain focused on the Identity and Access Management market. 

The launch of Kopius transforms the combined company’s value to its commercial and public sector clients. It brings together a highly skilled team drawn from the U.S. and Latin America, capable of working at scale and speed to tackle complex challenges across the array of digital technologies central to effective enterprise transformation. Today Kopius supports nearly 100 U.S. commercial and public sector clients with more than 600 consultants, designers, and engineers. Kopius delivers end-to-end capabilities across digital experience and strategy, technology solutions, and engineering and operational services. 

“This is an exciting day for everyone at Kopius. It crowns the integration of two great companies to create a new digital powerhouse. The whole team is united around the mission to solve digital tech challenges creatively,” said Jim Darrin, Kopius CEO. “Our true value to clients lies in our ability to achieve strategic clarity, build practical solutions for today, and continuously plan for what’s next. That way, we help clients get more value from their technology investments and navigate the future with confidence.” 

“Central to our strategy for Kopius is sustaining a challenging and rewarding home for the best technical talent in the business across the U.S. and Latin America,” said Matias Mazzuchelli, Kopius Chief Operating Officer. “Our work helping some of the world’s biggest brands retain their edge creates an exciting and diverse range of opportunities for consultants, designers, engineers, and developers to work at the cutting edge of technology and innovation. We’re excited to welcome new people to the Kopius family as we grow.”

“The launch of Kopius exemplifies our approach to building stronger businesses that achieve a greater impact for their clients and create new opportunities for their people,” said Tim Matthews, partner with Acacia. “Client demand is growing for the kind of agile, scalable, and high-value digital services that Kopius offers. With such an outstanding team, we know that they will make a major impact in their market.”

The leadership of Kopius combines industry-leading experience in technology strategy, solution design, engineering, and technical recruiting and team-building. Together, they will be relentlessly focused on designing quality solutions delivered by teams drawn from the best onshore and nearshore talent, technically skilled for the problems they solve, and culturally compatible with the clients they serve.

About Kopius

Kopius is a nearshore digital solutions company. We are resourceful and practical leaders of digital transformation. Kopius operates across all stages of the digital transformation journey with integrated consulting, design, and engineering services delivered by nearshore consulting and delivery teams. Applying creativity, curiosity, and resourcefulness to everything we do, we guide clients on their technology journey, helping them adapt to change and exploit new digital advances, driving continuous value from their technology investments. For more information, visit www.kopiustech.com.

About The Acacia Group

The Acacia Group is a specialist investment firm building stronger businesses by harnessing the power of digital transformation. We work closely with management teams as engaged and supportive partners, fostering resilient cultures of collaboration and innovation to make companies more valuable to their clients, employees and co-investors. By empowering skilful leaders, nurturing exceptional talent, investing in innovation and building distinctive brands, we create the qualities business need to achieve lasting success. For more information, please visit The Acacia Group or follow us on LinkedIn.


Women in Technology: Meet Bridgette Arthur-Mensah


Women in Tech

We sat with Bridgette Arthur-Mensah, a leader in technology and engineering operations.

After studying electrical engineering at Rutgers and Columbia University, Bridgette started her professional career in hardware and then transitioned to software thanks to an unexpected opportunity. Now, she provides advisory services helping clients enhance user experiences and growth through business agility, tech innovation, and cost management for a variety of industries.

Arthur-Mensah also serves as the Vice President of the New York Chapter of the Healthcare Businesswomen’s Association and volunteers at the Center for Information & Study on Clinical Research Participation.

We spoke with Bridgette about trends in technology and how women fit into this industry. Here are the highlights of that exchange!

Describe the kind of work you are doing today.

Today I really think of my work as being a strategic leader and hands-on problem solver, and I focus on two things: the execution and the economics of technology.

For execution, I help organizations optimize and manage their technology investment. I look at the most optimal way to strategize, plan, and deliver services or products to customers and users. I make sure we are forward-thinking in how we are executing and being agile.

For economics, these days I’m focusing on cost control. I help organizations maximize economies of scale, think about how contracts are structured, and maximize purchasing power. The current economy makes it important to know how to use what you have and make money go further. For example, we might examine whether a business is collecting tech debt without first understanding which features are actually in use. It takes time and money to manage that tech debt, and I help companies scrutinize those decisions.  

How did you get started in technology?

I grew up in Ghana West Africa, and in secondary school, I chose STEM as my focus area by the process of elimination. I didn’t love literature and liberal arts and I didn’t love biology (although my father would have loved for me to have been a doctor!) So, I focused on physics, engineering, and math. I enjoyed the challenge. I like the idea that you can build something, and it comes alive.

That early education put me on a path that led me to the US where I got my electrical engineering degrees. 

I started in hardware, working in engineering at IBM. It’s amazing how much computing power you can get from little atoms, and how atoms power everything that goes on in the software layer.

After spending 12 years working with semiconductors, I was contacted about a job that would pull me from hardware into software, and I went for it. The rest is history.

How did mentors and leadership factor into your success?

A good mentor can help you avoid certain pitfalls and do something better with some advice.  I wish I had connected with mentors earlier in my career because mentorship is great. If I were to do it again – and the advice I give my children is – find good mentors and keep them. Whether you are assigned to them or you build the relationships yourself, mentors are people that you can call for help when you are in tough situations.

Mentorship has helped me be intentional about my career, so I like to give back to my peers and those who are coming up in this industry.

How can we improve tech for women?

That’s a big question and we need more than one solution to make tech better for women. I will borrow from one of my role models, Indra Nooyi. In her book, My Life In Full, she showed that keeping women in business is a good economic decision. Similarly, keeping women in tech is good for the economy and the industry.

So how do we keep women in tech? Here are two urgent needs:

First, women are often responsible for caregiving, whether it is their parents, their children, or both. So, companies need to help women have a balance between home life and work life. Many companies are moving in the right direction, expanding things like maternal and paternal leave from six weeks to six months. Covid also de-stigmatized working from home, which helped in some ways. So, make the environment such that women can embrace it. Can you have company social events during the work day instead of after-hours? Can you give more notice before a multi-day offsite? If you make it hard for women to participate, you lose the benefit of their perspectives and experience.

The second urgent need is that leaders must be intentional about including women in strategic initiatives. Create opportunities for women to showcase their talent and advance their careers. This may mean that people in leadership positions need to address their unconscious biases. When our industry is infected with unconscious bias, it’s difficult for women to get equal opportunities. Leaders need to put an unconscious bias in check and commit to growing woman leaders by intentionally putting women in roles that allow them to stretch and show what they can bring to the table.  

Remember that this is just as good for the bottom line as it is for women because keeping women in tech is good for tech.

What is one thing that you wish more people knew about supporting women in technology?

I wish more people would open their eyes to a fresh idea of what a leader could look, sound, and act like. There are differences between female and male leadership, and many women don’t fit outdated notions about what makes a successful leader. Letting go of those preconceptions can open doors.

Also, start young. I love that more organizations are marketing STEM to young girls now. We need to show girls that STEM is fun and impactful and that if they work in STEM, they can build products that change the way we live for good. We want girls to use their extraordinary talents in STEM to shape this world now and in the future.

What is one piece of advice you’d like to share with anyone reading?

First, tech is becoming a basic language and it’s necessary for us to be versed in it to make sure that it is serving us well. Even jobs that aren’t considered technical now require a level of tech literacy, from designing to legal services to medicine.  When you open yourself up to knowing what’s new and understanding it, you open yourself up to more opportunities and ways to apply yourself.

Second, if you are a woman in technology, it might seem like the future is bleak right now because of the recent economic environment. But remember that this is a cyclical industry. Don’t let the downturn right now dampen your desire to do more, or grow more, or find new things to do in technology. Your career path might have a slower start than you envisioned, but your ideas and capabilities are needed in technology and business.

My advice is to keep swimming. Network. Please don’t leave the workforce. Keep contributing. Keep at it.

What trends are you seeing in technology?

First, AI is changing productivity, and even managing to be somewhat accurate. The next trend will be that AI will be trained to improve its performance for less mainstream uses and audiences. For example, AI voice recognition had been trained to understand the tonation of dominant demographics, and we must expand the training of these AI models to cover minority cases.  Broadening the datasets used in training could help to improve accuracy.

And speaking of AI, we also need to be thoughtful about how to apply strategy to AI deployments. It’s a good tool, and people are excited. Businesses need to think about how it helps differentiate their value prop with accuracy, of course.  

Second, businesses are scrutinizing the cost and economics of technology. For the past few years, there has been an insatiable appetite for technology and gadgets, especially in the health tech space. Investors are slowing down to question the economic value of these technologies and what it takes to keep them out there.

Third, globalization is changing cybersecurity. People want a seamless interconnected experience as we move from one tech to another, so we are sharing data and it is being used in ever-expanding ways. Business needs to care about cybersecurity if they want to operate in countries with strict security rules (including but not limited to most of Europe and the US). And if your brand is going to depend on customer trust, it takes just one major breach for customers to go to the next supplier. And one breach could also result in the loss of future opportunities and supplier considerations. Businesses are watching security more closely than ever for these reasons.

What tech does the world need now more than ever?

The number of climate change-related natural disasters and the extent of them are alarming. My greatest hope is that technology can help solve our global environmental issue. It’s the most important issue in our time.

To learn more about our digital transformation capabilities, reach out to us today! Kopius is a leader in nearshore digital technology consulting and services.


Additional resources:


5 Industries Winning at Artificial Intelligence


By Lindsay Cox

Augmented Intelligence (AI) and Machine Learning (ML) were already the technologies on everyone’s radar when the year started, and the release of Foundation Models like ChatGPT only increased the excitement about the ways that data technology can change our lives and our businesses. We are excited about these five industries that are winning at artificial intelligence.

As an organization, data and AI projects are right in our sweet spot. ChatGPT is very much in the news right now (and is a super cool tool – you can check it out here if you haven’t already).

I also enjoyed watching Watson play Jeopardy as a former IBMer 😊

There are a few real-world examples of how five organizations are winning at AI. We have included those use cases along with examples where our clients have been leading the way on AI-related projects.

You can find more case studies about digital transformation, data, and software application development in our Case Studies section of the website.

Consumer brands: Visualizing made easy

Brands are helping customers to visualize the outcome of their products or services using computer vision and AI. Consumers can virtually try on a new pair of glasses, a new haircut, or a fresh outfit, for example.  AI can also be used to visualize a remodeled bathroom or backyard.

We helped a teledentistry, web-first brand develop a solution using computer vision to show a customer how their smile would look after potential treatment. We paired the computer vision solution with a mobile web application so customers could “see their new selfie.” 

Consumer questions can be resolved faster and more accurately

Customer service can make or break customer loyalty, which is why chatbots and virtual assistants are being deployed at scale to reduce average handle time average speed-of-answer, and increase first-call resolutions.

We worked with a regional healthcare system to design and develop a “digital front door” to improve patient and provider experiences. The solution includes an interactive web search and chatbot functionality. By getting answers to patients and providers more quickly, the healthcare system is able to increase satisfaction and improve patient care and outcomes.

Finance: Preventing fraud

There’s a big opportunity for financial services organizations to use AI and deep learning solutions to recognize doubtful transactions and thwart credit card fraud which help reduce cost. Also known as anomaly detection, banks generate huge volumes of data which can be used to train machine learning models to flag fraudulent transactions.

Agriculture: Supporting ESG goals by operating more sustainably

Data technologies like computer vision can help organizations see things that humans miss. This can help with the climate crisis because it can include water waste, energy waste, and misdirected landfill waste.

The agritech industry is already harnessing data and AI since our food producers and farmers are under extreme pressure to produce more crops with less water. For example, John Deere created a robot called “See and Spray” that uses computer vision technology to monitor and spray weedicide on cotton plants in precise amounts.

We worked with PrecisionHawk to use computer vision combined with drone-based photography to analyze crops and fields to give growers precise information to better manage crops. The data produced through the computer vision project helped farmers to understand their needs and define strategies faster, which is critical in agriculture. (link to case study)

Healthcare: Identify and prevent disease

AI has an important role to play in healthcare, with uses ranging from patient call support to the diagnosis and treatment of patients.

For example, healthcare companies are creating clinical decision support systems that warn a physician in advance when a patient is at risk of having a heart attack or stroke adding critical time to their response window.

AI-supported e-learning is also helping to design learning pathways, personalized tutoring sessions, content analytics, targeted marketing, automatic grading, etc. AI has a role to play in addressing the critical healthcare training need in the wake of a healthcare worker shortage.

Artificial intelligence and machine learning are emerging as the most game-changing technologies at play right now. These are a few examples that highlight the broad use and benefits of data technologies across industries. The actual list of use cases and examples is infinite and expanding.

What needs to happen for your company to win at artificial intelligence? To learn more about Artificial Intelligence and Machine Learning, reach out to us today! Kopius is a leader in nearshore digital technology consulting and services.


Additional resources:


Addressing AI Bias – Four Critical Questions


By Hayley Pike

As AI becomes even more integrated into business, so does AI bias.

On February 2, 2023, Microsoft released a statement from Vice Chair & President Brad Smith about responsible AI. In the wake of the newfound influence of ChatGPT and Stable Diffusion, considering the history of racial bias in AI technologies is more important than ever.

The discussion around racial bias in AI has been going on for years, and with it, there have been signs of trouble. Google fired two of its researchers, Dr. Timnit Gebru and Dr. Margaret Mitchell after they published research papers outlining how Google’s language and facial recognition AI were biased against women of color. And speech recognition software from Amazon, Microsoft, Apple, Google, and IBM misidentified speech from Black people at a rate of 35%, compared to 19% of speech from White people.

In more recent news, DEI tech startup Textio analyzed ChatGPT showing how it skewed towards writing job postings for younger, male, White candidates- and the bias increased for prompts for more specific jobs.

If you are working on an AI product or project, you should take steps to address AI bias. Here are four important questions to help make your AI more inclusive:

  1. Have we incorporated ethical AI assessments into the production workflow from the beginning of the project? Microsoft’s Responsible AI resources include a project assessment guide.
  2. Are we ready to disclose our data source strengths and limitations? Artificial intelligence is as biased as the data sources it draws from. The project should disclose who the data is prioritizing and who it is excluding.
  3. Is our AI production team diverse? How have you accounted for the perspectives of people who will use your AI product that are not represented in the project team or tech industry?
  4. Have we listened to diverse AI experts? Dr. Joy Buolamwini and Dr. Inioluwa Deborah Raji, currently at the MIT Media Lab, are two black female researchers who are pioneers in the field of racial bias in AI.

Rediet Adebe is a computer scientist and co-founder of Black in AI. Adebe sums it up like this:

“AI research must also acknowledge that the problems we would like to solve are not purely technical, but rather interact with a complex world full of structural challenges and inequalities. It is therefore crucial that AI researchers collaborate closely with individuals who possess diverse training and domain expertise.”

To learn more about artificial intelligence and machine learning, reach out to us today! Kopius is a leader in nearshore digital technology consulting and services.


Additional resources:


ChatGPT and Foundation Models: The Future of AI-Assisted Workplace


By Yuri Brigance

The rise of generative models such as ChatGPT and Stable Diffusion has generated a lot of discourse about the future of work and the AI-assisted workplace. There is tremendous excitement about the awesome new capabilities such technology promises, as well as concerns over losing jobs to automation. Let’s look at where we are today, how we can leverage these new AI-generated text technologies to supercharge productivity, and what changes they may signal to a modern workplace.

Will ChatGPT Take Away Your Job?

That’s the question on everyone’s mind. AI can generate images, music, text, and code. Does this mean that your job as a designer, developer, or copywriter is about to be automated? Well, yes. Your job will be automated in the sense that it is about to become a lot more efficient, but you’ll still be in the driver’s seat.

First, not all automation is bad. Before personal computers became mainstream, taxes were completed with pen and paper. Did modern tax software put accountants out of business? Not at all. It made their job easier by automating repetitive, boring, and boilerplate tasks. Tax accountants are now more efficient than ever and can focus on mastering tax law rather than wasting hours pushing paper. They handle more complicated tax cases, those personalized and tailored to you or your business. Similarly, it’s fair to assume that these new generative AI tools will augment creative jobs and make them more efficient and enjoyable, not supplant them altogether.

Second, generative models are trained on human-created content. This ruffles many feathers, especially those in the creative industry whose art is being used as training data without the artist’s explicit permission, allowing the model to replicate their unique artistic style. Stability.ai plans to address this problem by enabling artists to opt out of having their work be part of the dataset, but realistically there is no way to guarantee compliance and no definitive way to prove whether your art is still being used to train models. But this does open interesting opportunities. What if you licensed your style to an AI company? If you are a successful artist and your work is in demand, there could be a future where you license your work to be used as training data and get paid any time a new image is generated based on your past creations. It is possible that responsible AI creators can calculate the level of gradient updates during training, and the percentage of neuron activation associated to specific samples of data to calculate how much of your licensed art was used by the model to generate an output. Just like Spotify pays a small fee to the musician every time someone plays one of their songs, or how websites like Flaticon.com pay a fee to the designer every time one of their icons is downloaded.  Long story short, it is likely that soon we’ll see more strict controls over how training datasets are constructed regarding licensed work vs public domain.

Let’s look at some positive implications of this AI-assisted workplace and technology as it relates to a few creative roles and how this technology can streamline certain tasks.

As a UI designer, when designing web and mobile interfaces you likely spend significant time searching for stock imagery. The images must be relevant to the business, have the right colors, allow for some space for text to be overlaid, etc. Some images may be obscure and difficult to find. Hours could be spent finding the perfect stock image. With AI, you can simply generate an image based on text prompts. You can ask the model to change the lighting and colors. Need to make room for a title? Use inpainting to clear an area of the image. Need to add a specific item to the image, like an ice cream cone? Show AI where you want it, and it’ll seamlessly blend it in. Need to look up complementary RGB/HEX color codes? Ask ChatGPT to generate some combinations for you.

Will this put photographers out of business? Most likely not. New devices continue to come out, and they need to be incorporated into the training data periodically. If we are clever about licensing such assets for training purposes, you might end up making more revenue than before, since AI can use a part of your image and pay you a partial fee for each request many times a day, rather than having one user buy one license at a time. Yes, work needs to be done to enable this functionality, so it is important to bring this up now and work toward a solution that benefits everyone. But generative models trained today will be woefully outdated in ten years, so the models will continue to require fresh human-generated real-world data to keep them relevant. AI companies will have a competitive edge if they can license high-quality datasets, and you never know which of your images the AI will use – you might even figure out which photos to take more of to maximize that revenue stream.

Software engineers, especially those in professional services frequently need to switch between multiple programming languages. Even on the same project, they might use Python, JavaScript / TypeScript, and Bash at the same time. It is difficult to context switch and remember all the peculiarities of a particular language’s syntax. How to efficiently do a for-loop in Python vs Bash? How to deploy a Cognito User Pool with a Lambda authorizer using AWS CDK? We end up Googling these snippets because working with this many languages forces us to remember high-level concepts rather than specific syntactic sugar. GitHub Gist exists for the sole purpose of offloading snippets of useful code from local memory (your brain) to external storage. With so much to learn, and things constantly evolving, it’s easier to be aware that a particular technique or algorithm exists (and where to look it up) rather than remember it in excruciating detail as if reciting a poem. Tools like ChatGPT integrated directly into the IDE would reduce the amount of time developers spend remembering how to create a new class in a language they haven’t used in a while, how to set up branching logic or build a script that moves a bunch of files to AWS S3. They could simply ask the IDE to fill in this boilerplate to move on to solving the more interesting algorithmic challenges.

An example of asking ChatGPT how to use Python decorators. The text and example code snippet is very informative.

For copywriters, it can be difficult to overcome the writer’s block of not knowing where to start or how to conclude an article. Sometimes it’s challenging to concisely describe a complicated concept. ChatGPT can be helpful in this regard, especially as a tool to quickly look up clarifying information about a topic. Though caution is justified as demonstrated recently by Stephen Wolfram, CEO of Wolfram Alpha who makes a compelling argument that ChatGPT’s answers should not always be taken at face value.. So doing your own research is key. That being the case, OpenAI’s model usually provides a good starting point at explaining a concept, and at the very least it can provide pointers for further research. But for now, writers should always verify their answers. Let’s also be reminded that ChatGPT has not been trained on any new information created after the year 2021, so it is not aware of new developments on the war in Ukraine, current inflation figures, or the recent fluctuations of the stock market, for example.

In Conclusion

Foundation models like ChatGPT and Stable Diffusion can augment and streamline workflows, and they are still far from being able to directly threaten a job. They are useful tools that are far more capable than narrowly focused deep learning models, and they require a degree of supervision and caution. Will these models become even better 5-10 years from now? Undoubtedly so. And by that time, we might just get used to them and have several years of experience working with these AI agents, including their quirks and bugs.

There is one important thing to take away about Foundation Models and the future of the AI-assisted workplace: today they are still very expensive to train. They are not connected to the internet and can’t consume information in real-time, in online incremental training mode. There is no database to load new data into, which means that to incorporate new knowledge, the dataset must grow to encapsulate recent information, and the model must be fine-tuned or re-trained from scratch on this larger dataset. It’s difficult to verify that the model outputs factually correct information since the training dataset is unlabeled and the training procedure is not fully supervised. There are interesting open source alternatives on the horizon (such as the U-Net-based StableDiffusion), and techniques to fine-tune portions of the larger model to a specific task at hand, but those are more narrowly focused, require a lot of tinkering with hyperparameters, and generally out of scope for this particular article.

It is difficult to predict exactly where foundation models will be in five years and how they will impact the AI-assisted workplace since the field of machine learning is rapidly evolving. However, it is likely that foundation models will continue to improve in terms of their accuracy and ability to handle more complex tasks. For now, though, it feels like we still have a bit of time before seriously worrying about losing our jobs to AI. We should take advantage of this opportunity to hold important conversations now to ensure that the future development of such systems maintains an ethical trajectory.

To learn more about our generative AI solutions, reach out to us today! Kopius is a leader in nearshore digital technology consulting and services.


Additional resources:


What Separates ChatGPT and Foundation Models from Regular AI Models?


By Yuri Brigance

This introduces what separates foundation models from regular AI models. We explore the reasons these models are difficult to train and how to understand them in the context of more traditional AI models.

chatGPT Foundation Model

What Are Foundation Models?

What are foundation models, and how are they different from traditional deep learning AI models? The Stanford Institute’s Center of Human-Centered AI defines a foundation model as “any model that is trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks”. This describes a lot of narrow AI models as well, such as MobileNets and ResNets – they too can be fine-tuned and adapted to different tasks.

The key distinctions here are “self-supervision at scale” and “wide range of tasks”.

Foundation models are trained on massive amounts of unlabeled/semi-labeled data, and the model contains orders of magnitude more trainable parameters than a typical deep learning model meant to run on a smartphone. This makes foundation models capable of generalizing to a much wider range of tasks than smaller models trained on domain-specific datasets. It is a common misconception that throwing lots of data at a model will suddenly make it do anything useful without further effort.  Actually, such large models are very good at finding and encoding intricate patterns in the data with little to no supervision – patterns which can be exploited in a variety of interesting ways, but a good amount of work needs to happen in order to use this learned hidden knowledge in a useful way.

The Architecture of AI Foundation Models

Unsupervised, semi-supervised, and transfer learning are not new concepts, and to a degree, foundation models fall into this category as well. These learning techniques trace their roots back to the early days of generative modeling such as Restricted Boltzmann Machines and Autoencoders. These simpler models consist of two parts: an encoder and a decoder. The goal of an autoencoder is to learn a compact representation (known as encoding or latent space) of the input data that captures the important features or characteristics of the data, aka “progressive linear separation” of the features that define the data. This encoding can then be used to reconstruct the original input data or generate entirely new synthetic data by feeding cleverly modified latent variables into the decoder.

An example of a convolutional image autoencoder model architecture is trained to reconstruct its own input, ex: images. Intelligently modifying the latent space allows us to generate entirely new images. One can expand this by adding an extra model that encodes text prompts into latent representations understood by the decoder to enable text-to-image functionality.

Many modern ML models use this architecture, and the encoder portion is sometimes referred to as the backbone with the decoder being referred to as the head. Sometimes the models are symmetrical, but frequently they are not. Many model architectures can serve as the encoder or backbone, and the model’s output can be tailored to a specific problem by modifying the decoder or head. There is no limit to how many heads a model can have, or how many encoders. Backbones, heads, encoders, decoders, and other such higher-level abstractions are modules or blocks built using multiple lower-level linear, convolutional, and other types of basic neural network layers. We can swap and combine them to produce different tailor-fit model architectures, just like we use different third-party frameworks and libraries in traditional software development. This, for example, allows us to encode a phrase into a latent vector which can then be decoded into an image.

Foundation Models for Natural Language Processing

Modern Natural Language Processing (NLP) models like ChatGPT fall into the category of Transformers. The transformer concept was introduced in the 2017 paper “Attention Is All You Need” by Vaswani et al. and has since become the basis for many state-of-the-art models in NLP. The key innovation of the transformer model is the use of self-attention mechanisms, which allow the model to weigh the importance of different parts of the input when making predictions. These models make use of something called an “embedding”, which is a mathematical representation of a discrete input, such as a word, a character, or an image patch, in a continuous, high-dimensional space. Embeddings are used as input to the self-attention mechanisms and other layers in the transformer model to perform the specific task at hand, such as language translation or text summarization. ChatGPT isn’t the first, nor the only transformer model around. In fact, transformers have been successfully applied in many other domains such as computer vision and sound processing.

So if ChatGPT is built on top of existing concepts, what makes it so different from all the other state-of-the-art model architectures already in use today? A simplified explanation of what distinguishes a foundation model from a “regular” deep learning model is the immense scale of the training dataset as well as the number of trainable parameters that a foundation model has over a traditional generative model. An exceptionally large neural network trained on a truly massive dataset gives the resulting model the ability to generalize to a wider range of use cases than its more narrowly focused brethren, hence serving as a foundation for an untold number of new tasks and applications. Such a large model encodes many useful patterns, features, and relationships in its training data. We can mine this body of knowledge without necessarily re-training the entire encoder portion of the model. We can attach different new heads and use transfer learning and fine-tuning techniques to adapt the same model to different tasks. This is how just one model (like Stable Diffusion) can perform text-to-image, image-to-image, inpainting, super-resolution, and even music generation tasks all at once.

Challenges in Training Foundation Models

The GPU computing power and human resources required to train a foundation model like GPT from scratch dwarf those available to individual developers and small teams. The models are simply too large, and the dataset is too unwieldy. Such models cannot (as of now) be cost-effectively trained end-to-end and iterated using commodity hardware.

Although the concepts may be well explained by published research and understood by many data scientists, the engineering skills and eye-watering costs required to wire up hundreds of GPU nodes for months at a time would stretch the budgets of most organizations. And that’s ignoring the costs of dataset access, storage, and data transfer associated with feeding the model massive quantities of training samples.

There are several reasons why models like ChatGPT are currently out of reach for individuals to train:

  1. Data requirements: Training a large language model like ChatGPT requires a massive amount of text data. This data must be high-quality and diverse and is typically obtained from a variety of sources such as books, articles, and websites. This data is also preprocessed to get the best performance, which is an additional task that requires knowledge and expertise. Storage, data transfer, and data loading costs are substantially higher than what is used for more narrowly focused models.
  2. Computational resources: ChatGPT requires significant computational resources to train. This includes networked clusters of powerful GPUs, and a large amount of memory volatile and non-volatile. Running such a computer cluster can easily reach hundreds of thousands per experiment.
  3. Training time: Training a foundation model can take several weeks or even months, depending on the computational resources available. Wiring up and renting this many resources requires a lot of skill and a generous time commitment, not to mention associated cloud computing costs.
  4. Expertise: Getting a training run to complete successfully requires knowledge of machine learning, natural language processing, data engineering, cloud infrastructure, networking, and more. Such a large cross-disciplinary set of skills is not something that can be easily picked up by most individuals.

Accessing Pre-Trained AI Models

That said, there are pre-trained models available, and some can be fine-tuned with a smaller amount of data and resources for a more specific and narrower set of tasks, which is a more accessible option for individuals and smaller organizations.

Stable Diffusion took $600k to train – the equivalent of 150K GPU hours. That is a cluster of 256 GPUs running 24/7 for nearly a month.  Stable Diffusion is considered a cost reduction compared to GPT. So, while it is indeed possible to train your own foundation model using commercial cloud providers like AWS, GCP, or Azure, the time, effort, required expertise, and overall cost of each iteration impose limitations on their use. There are many workarounds and techniques to re-purpose and partially re-train these models, but for now, if you want to train your own foundation model from scratch your best bet is to apply to one of the few companies which have access to resources necessary to support such an endeavor.

Contact Us for AI Services

If you are ready to leverage artificial intelligence and machine learning solutions, reach out to us today! Kopius is a leader in nearshore digital technology consulting and services.


Additional resources:


Data Trends: Six Ways Data Will Change Business in 2023 and Beyond


By Kristina Scott

Data is big and getting bigger. We’ve tracked six major data-driven trends for the coming year.

Digital analytics data visualization, financial schedule, monitor screen in perspective

Data is one of the fastest-growing and most innovative opportunities today to shape the way we work and lead. IDC predicts that by 2024, the inability to perform data- and AI-driven strategy will negatively affect 75% of the world’s largest public companies. And by 2025, 50% of those companies will promote data-informed decision-making by embedding analytics in their enterprise software (up from 33% in 2022), boosting demand for more data solutions and data-savvy employees.

Here is how data trends will shift in 2023 and beyond:

  1. Data Democratization Drives Data Culture

If you think data is only relevant to analysts with advanced knowledge of data science, we’ve got news for you.  Data democratization is one of the most important trends in data. Gartner research forecasts that 80% of data-driven initiatives that are focused on business outcomes will become essential business functions by 2025.

Organizations are creating a data culture by attracting data-savvy talent and promoting data use and education for employees at all levels. To support data democratization, data must be exact, easily digestible, and accessible.

Research by McKinsey found that high-performing companies have a data leader in the C-suite and make data and self-service tools universally accessible to frontline employees.

2. Hyper-Automation and Real-Time Data Lower Costs

Real-time data and its automation will be the most valuable big data tools for businesses in the coming years. Gartner forecasts that by 2024, rapid hyper-automation will allow organizations to lower operational costs by 30%. And by 2025, the market for hyper-automation software will hit nearly $860 billion.

3. Artificial Intelligence and Machine Learning (AI & ML) Continue to Revolutionize Operations

The ability to implement AI and ML in operations will be a significant differentiator. Verta Insights found that industry leaders that outperform their peers financially, are more than 2x as likely to ship AI projects, products, or features, and have made AI/ML investments at a higher level than their peers.

AI and ML technologies will boost the Natural Language Processing (NLP) market. NLP enables machines to understand and communicate with us in spoken and written human languages. The NLP market size will grow from $15.7 billion in 2022 to $49.4 billion by 2027, according to research from MarketsandMarkets.

We have seen the wave of interest in OpenAI’s ChatGPT, a conversational language-generation software. This highly-scalable technology could revolutionize a range of use cases— from summarizing changes to legal documents to completely changing how we research information through dialogue-like interactions, says CNBC.

This can have implications in many industries. For example, the healthcare sector already employs AI for diagnosis and treatment recommendations, patient engagement, and administrative tasks. 

4. Data Architecture Leads to Modernization

Data architecture accelerates digital transformation because it solves complex data problems through the automation of baseline data processes, increases data quality, and minimizes silos and manual errors. Companies modernize by leaning on data architecture to connect data across platforms and users. Companies will adopt new software, streamline operations, find better ways to use data, and discover new technological needs.

According to MuleSoft, organizations are ready to automate decision-making, dynamically improve data usage, and cut data management efforts by up to 70% by embedding real-time analytics in their data architecture.

5. Multi-Cloud Solutions Optimize Data Storage

Cloud use is accelerating. Companies will increasingly opt for a hybrid cloud, which combines the best aspects of private and public clouds.

Companies can access data collected by third-party cloud services, which reduces the need to build custom data collection and storage systems, which are often complex and expensive.

In the Flexera State of Cloud Report, 89% of respondents have a multi-cloud strategy, and 80% are taking a hybrid approach.

6. Enhanced Data Governance and Regulation Protect Users

Effective data governance will become the foundation for impactful and valuable data. 

As more countries introduce laws to regulate the use of various types of data, data governance comes to the forefront of data practices. European GDPR, Canadian PIPEDA, and Chinese PIPL won’t be the last laws that are introduced to protect citizen data.

Gartner has predicted that by 2023, 65% of the world’s population will be covered by regulations like GDPR. In turn, users will be more likely to trust companies with their data if they know it is more regulated.

Valence works with clients to implement a governance framework, find sources of data and data risk, and activate the organization around this innovative approach to data and process governance, including education, training, and process development. Learn more.

What these data trends add up to

As we step into 2023, organizations that understand current data trends can harness data to become more innovative, strategic, and adaptable. Our team helps clients with data assessments, by designing and structuring data assets, and by building modern data management solutions. We strategically integrate data into client businesses, use machine learning and artificial intelligence to create proactive insights, and create data visualizations and dashboards to make data meaningful.  

We help clients to develop a solution and create a modern data architecture that supports differentiated, cloud-enabled scalability, self-service capability, and faster time-to-market for new data products and solutions. Learn more.

Additional resources:


Women in Technology – Meet Aravinda Gollapudi


“Technology is an enabler that will be a game changer in shaping society. Women have a role in how that technology is used and how society will be changed.” Aravinda Gollapudi, Head of Platform and Technology at Sage

We sat with Aravinda Gollapudi, Head of Platform and Technology at Sage, a $2B company that provides small and medium-sized businesses with finance, HR, and payroll software. At Sage, she leads a globally disbursed organization of roughly 270 employees across product, technology, release management, program management, and more. Aravinda also rounds out her technical work by serving as a board advisor for Artifcts and Loopr.

We spoke with Aravinda about technology and how women fit into this industry. Here are the highlights of that exchange.

What is your role in technology? What are you doing today?

I create an operating model for platforms, processes, and organizations to combine speed and scale. This drives market leadership and innovative solutions while accelerating velocity through organizational structure. I do this by leading the technology organization for cloud-native financial services for midmarket at Sage while also driving the product roadmap and strategy for platform as a business unit leader.

I also advise, mentor, and partner with CEOs of startup companies as a board advisor around technologies like AI/ML, SAAS, Cloud, Organizational strategy, and business models. I also help bring my network together to drive go-to-market activities.

I have a unique opportunity with my role to drive the convergence of business outcomes and technology/investment enablers by Identifying, blueprinting, and leading solutions to market – for today, tomorrow, and the future.

How did you get started in technology?

I think my inclination toward technology may be attributed to my interest in mathematics. I am old enough to appreciate how personal computers fueled the exponential adoption of technology!

The current generation of technologists have immense computing power at their fingertips, but I started my foray into technology when I had to use UNIX servers to do my work.

Early in my career, I wanted to go into academia and research in physics. I was working on research work in quantum optics after I graduated with my Master’s in Physics. I spent a lot of time with programming models in Fortran language, which is used in scientific computing. Fortran introduced me to computer programming.

My interest in technology got stronger while I was pursuing my second Masters in Computer Engineering.  Although I was taking coursework on hardware and software, I gravitated toward software programming.

What can you tell us about the people who paved the way for you? How did mentors factor into your success?

A big part of my career was shaped very early by my parents, who are both teachers. They instilled the importance of learning and hard work. My dad was a teacher and a principal and was a role model in raising the bar on work ethic, discipline, respect, and courage. My mom, through her multiple master’s degrees and pursuit of continual learning, showed us that it was important to keep learning.

Later in my career, I was fortunate to have the support of my managers, mentors, and colleagues. I leveraged them to learn the craft around software but also around organizational design, product strategy, and overall leadership. I am fortunate to have mentors who challenged me to be better and watch for my blindsides. I still lean on them to this day. A few of my mentors include Christine Heckart, Jeff Collins, Himanshu Baxi, Keith Olsen, and Kathleen Wilson. They have been my managers or mentors who gave me candid feedback, motivated me, and helped me grow my leadership skills.

I would be remiss if I didn’t mention the support and encouragement from my husband! He has always pushed me to take on challenges and supported me while we both balanced family and work.

How can we improve tech for women?

If we want to improve tech for women, we must invest in girls in technology. Hiring managers need to overcome unconscious bias and create early career opportunities for girls.

Mentorship is crucial: We need to acknowledge that the learning and career path is often different for women. Having a strong mentor irrespective of gender helps women learn how to deal with situational issues and career development. Women leaders who can take on this mantle to share their experience and mentor rising stars will help those who do not have a straight journey line in their careers.

Given the smaller percentage of women represented in technology, I am happy to see the trend in the recent past elevating this topic at all levels.  By becoming mentors, diversity champions can make a real impact on improving the trend.

We need to invest in allyship and mentorship and elevate the importance of gender diversity. For example, with board searches, organizations like 50/50 Women on Boards elevate the value of having gender diversity and work on legislative support. We need more of that or else we leave behind half the population.

What is one thing you wish more people knew to support women in technology?

I wish more people understood the impact of unconscious bias. Most people do not intend to be biased, but human nature makes us lean toward certain decisions or actions. The tech industry would greatly improve if more people took simple steps to avoid their unconscious bias, like making decisions in several settings (avoiding the time of day impact), ensuring that names and accents don’t impact hiring decisions, and investing in diversity.

What’s around the corner in technology? What trends excite you?

There are three significant trends that excite me right now: Artificial Intelligence/Machine Learning, data, and sustainable tech.

We are living in the world of AI Everywhere and there is more to come. Five out of six Americans use AI services daily. I expect AI to keep shaping automation and intelligent interactions, and drive efficiency.

This is also an exciting time to work with data. We are moving towards a hyper-connected digital world, and the old way of doing things required us to harness vast amounts of digital data from siloed sources. The trend is moving toward driving networks and connections that will fuel more complex machine-to-machine interactions. This data connectivity will impact our lives at home, schools, offices, etc., and will dramatically change how we conduct business.

And I am particularly excited about trends in sustainable technology. We will see more investment in technologies that reduce the impact of compute-hungry technology. I’m anticipating an evolution to more environmentally sustainable investments, which will help us reduce the usage of wasteful resources such as data centers, storage, and computing.

What does the tech world need now more than ever?

The tech world needs better data security and more diversity.

Data is highly accessible in our lives (in part thanks to social media), so we need more investment in privacy and security. We already have seen the impact of this need across personal lives, the political landscape, and business.

For too long, we have not invested enough in diversity, so we have a lot of catching up to do. In the world of technology, we are woefully behind in diversity in leadership positions, particularly in the US tech sector where about 20% of technology leadership positions are held by women.

Data is ubiquitous, tools and frameworks are at our fingertips, and technology is covered earlier in schools, so we are seeing a younger starting age for people getting involved in building technology products/applications. We’ve reduced the barrier of entry (languages, frameworks, low code/no-code tools sets) to make it easier to adopt technology without the overhead of complex coursework.  With all these improvements, why are we still so behind on diversity? In the US tech sector, 62% of jobs are held by white Americans. Asian Americans hold 20% of jobs. Latinx Americans hold 8% of jobs. Black Americans hold 7% of jobs. Only 26.7% of tech jobs are held by women.

We have the tools and training. Now we need to change the profile of the workforce to include a more diverse community.

What’s one piece of advice that you would share with anyone reading this?

For women who are reading this, I strongly encourage you to avoid self-doubt and gain confidence by arming yourselves with knowledge and mentors. By bringing our best selves forward, we can focus on opportunities and not obstacles. Technology is an enabler that will be a game changer in shaping society. Women have a role in how that technology is used and how society will be changed.

Additional Resources