- The AI Collective Word
- Posts
- Addressing GenAI Challenges on Cybersecurity, Project IDX, and Scraping Publishers' Data
Addressing GenAI Challenges on Cybersecurity, Project IDX, and Scraping Publishers' Data
AI Cybersecurity and Privacy Week in Review


Welcome to today’s newsletter!
This past week:
Being aware of Generative AI challenges risks on cybersecurity and how to address them
If content publishers don’t want their data scraped by Google, they need to opt-out, but how?
GPTBot by OpenAI crawls the web for public data, but its transparency makes it easy for webmasters to control access
Project IDX by Google
Some enterprises create their own small, domain-specific language models using their own data to solve specific business problems
I hope you enjoy this week’s newsletter!
Please subscribe to The AI Collective Word today to receive your free newsletter directly in your inbox and join our ever-growing network.
Share it with a friend or colleague if you find it helpful.
RISK AND SECURITY MANAGEMENT

Created in Canva
Generative AI can have serious consequences for cybersecurity and digital trust, as well as legal and ethical issues. Generative AI models depend on data, which can influence their output in unwanted ways. We need to be aware of the challenges and risks of generative AI content, and how to address them.
New 'Deep Learning Attack' Deciphers Laptop Keystrokes with 95% Accuracy - The Hacker News
Researchers have found a way to use a phone or Zoom to listen to laptop keystrokes and guess what is being typed. They used a technique called "deep learning-based acoustic side-channel attack" and achieved high accuracy. This could be a threat to user privacy and security, as passwords and other secrets could be exposed. The researchers tested 36 keys on a MacBook Pro with different fingers and pressures.
A remote worker at LastPass, a password manager company, unknowingly installed a keylogger malware on his home computer. The hacker who planted the malware accessed his credentials and passwords, and breached the company's system. The hacker stayed inside for months and compromised the data of 33 million users. This is a case of how remote work security can fail.
The US government wants to use AI to protect its digital systems from hackers. It's launching a two-year contest with DARPA and big tech companies to create AI tools that can find and fix software bugs. The White House says this is a way to work together and shape the future of cybersecurity.
REGULATIONS

Created in Canva
The Indian parliament's lower house has passed a new data privacy bill that requires companies to get user consent before processing their data. The bill also gives the government some exemptions and powers to regulate data collection and use. The bill aims to protect the privacy rights of Indian citizens in the digital age. The bill still needs approval from the upper house and the president to become law.
The draft rules from China's cyberspace regulator aim to protect the security and privacy of facial recognition technology users. The rules require a clear purpose, a strong necessity and strict measures for using the technology, as well as the consent of the individuals. The rules also encourage non-biometric solutions when possible.
A group of media organizations has called for new rules to protect the rights of creators whose data is used to train generative AI models. The open letter asks for global regulations that require transparency, consent, negotiation, identification, and quality control for AI services that use media content. The signatories say that AI models that use media data without permission or payment harm the media industry and reduce media diversity and public access to reliable information.
PRIVACY
Meta Platforms, the owner of Facebook and Instagram, is challenging a fine from Norway's data regulator for violating users' privacy. The company wants a court to stop the fine of 1 million crowns ($97,700) per day from Aug. 14 until Nov. 3. The regulator says Meta Platforms cannot use user data, such as location, for targeted ads. The fine could become permanent and apply to the whole of Europe if the European Data Protection Board agrees with the regulator.
Google faces a $5 billion lawsuit for allegedly tracking users' online activity without their consent, even when they used private browsing modes. A U.S. judge denied Google's request to dismiss the case, saying the company did not clearly inform users about its data collection practices. The plaintiffs claim Google violated their privacy and collected valuable information about their personal interests and preferences. The judge said there was evidence that users' data had a market value and that Google made promises to limit its data collection.
Google wants to use the content created by digital publishers for training its AI. But this raises some legal and ethical questions. Google suggests that publishers who don't want their content used should opt out. This means they have to learn how to do that and set it up on their own websites. Google says this is part of creating an AI-friendly internet with common standards. But some people may not trust Google to respect those standards or the rights of the publishers.
ML and NLP
OpenAI's new web crawler, GPTBot, collects public data from the internet to train AI models. The bot is transparent and respectful, allowing webmasters to control its access. GPTBot can help improve AI systems without scraping private or sensitive content.
PLATFORM ENGINEERING
GitHub Copilot can now tell developers when its suggestions match code in a public repository - TechCrunch
GitHub Copilot is a tool that helps developers write code faster and better. But it can also cause problems when it generates code that is similar to existing public code. To solve this, GitHub introduced a new feature that lets developers choose whether to use or ignore the matching code. This feature also helps developers discover and contribute to open-source libraries that may be useful for their projects.
The potential of ChatGPT for software testing - TechTarget
ChatGPT is an AI assistant that can help software testers with various tasks, such as generating unit tests, recommending test strategies, explaining code behavior, creating documentation, and suggesting test scenarios. ChatGPT can respond to natural language prompts and provide guidance for different types of applications and test cases. However, ChatGPT also poses some security risks, such as exposing sensitive data or introducing vulnerabilities in the code. Therefore, software teams should use ChatGPT with caution and follow best practices to ensure safe and effective testing.
Technical debt is a metaphor that compares software development to borrowing money. Ward Cunningham, who coined the term in 1992, explained that sometimes it is faster to write code that is not perfect, but it has to be improved later. Otherwise, the code becomes hard to understand, modify and maintain. He also pointed out that technical debt is influenced by human factors, such as the programmers' skills and business goals, and that it can affect the productivity and flexibility of the engineering teams.
Introducing Project IDX - Google
Project IDX is a new initiative from Google that aims to simplify and improve app development in the cloud. It offers a web-based workspace with familiar coding tools and fresh features. It also leverages AI technology to provide collaboration, debugging, code review, and more. Project IDX is still experimental and needs your feedback to make it better. There is a waitlist for the limited preview.
ETHICS

Created in Canva
The Ethics of AI Ethics - aiXiv
AI ethics is a field that studies how to ensure that AI systems are beneficial and fair for humans and society. Many ethics guidelines have been proposed to guide the design and use of AI, but they are not always consistent or comprehensive. This paper reviews and compares these guidelines and also examines how they are applied in practice. It also suggests ways to improve the effectiveness of AI ethics.
Researchers have demonstrated that they can bypass the filters intended to prevent chatbots powered by generative AI models from producing toxic and harmful content. Researchers from Carnegie Mellon University, the Center for A.I. Safety, and the Bosch Center for AI were able to trick these models, including OpenAI's ChatGPT and Google Bard, into generating disinformation, hate speech, and harmful content. The technique involves providing specific character sequences that the models should have prevented, raising concerns about the safety and reliability of such models, especially when used autonomously. This discovery has highlighted the challenges enterprises face in ensuring the safe use of generative AI applications.
USE CASES
Google Cloud helps industrial customers run and manage AI applications at the edge with the Vertex AI platform. This platform offers pre-trained and custom models for vision and video use cases, such as PPE detection, inventory management, and predictive maintenance. Customers can train and deploy their models on the public cloud, edge locations, and devices with ease and efficiency.
Generative AI is a powerful technology that can write content and improve search results using large language models (LLMs). However, LLMs have some limitations, such as hallucinations, security risks, and lack of domain knowledge. Therefore, some enterprises are creating their own small, domain-specific language models using their own data. These models can solve specific business problems in different industries and use natural language processing more effectively.
RESOURCES
The post presents an excellent summary of machine-learning certificates for Amazon, Google, IBM, and Microsoft to help further your career, including tips and skills you should know to help prepare you for the exam.
Docker Crash Course for Data Scientists - Data Science Horizons
This tutorial teaches you Docker basics and how to use it for data science. You will learn about Docker concepts, components, and commands. You will also learn how to build and run Docker containers for data tasks. Finally, you will learn how to optimize and secure your Docker deployments. This course is practical and interactive, with examples and exercises to help you master Docker for data science. Read time: 21 minutes
Thank you for reading! Please send me feedback and share the newsletter with others.