Thursday, July 25, 2024

What is a Tabletop Exercise (TTX)

Essence of Periodic Cyber Security Tabletop Exercises

Tabletop Exercises (TTXs) take various forms, ranging from internal self-assessments to external, paid engagements. These exercises offer eye-opening insights for technical teams, managers, leaders, and board members. To ensure effectiveness, TTXs must feature customized and realistic scenarios tailored to your organization, requiring preparation and commitment from all participants. Scenarios should build up to engage both technical and non-technical parties, particularly communications and legal teams. Common scenarios include malware leading to ransomware, loss of backups and credentials, main communication failures requiring out-of-band communication, and obligations to third parties, particularly customers/consumers. Additionally, technical simulations may involve log reviews and device outage scenarios. Gamification can be incorporated to allow participants to assume different roles within the organization, enhancing their understanding of responsibilities outside their own job descriptions.

 

Security Adversaries and Breach Problems

Threat actors emerge and disappear rapidly with each breach. Adversaries fall into three main categories:

  1. Hacktivists and Terrorists: Motivated by ethics and political biases, these actors generally have lower skill levels and focus on disrupting information disclosure.
  2. Criminals: Driven by financial gains, this category includes actors with a wide range of skill levels, from low to highly skilled, engaging in ransomware and extortion.
  3. State-Sponsored Actors: Motivated by geopolitical and financial gains, these actors possess moderate to high skill levels, focusing on data theft, espionage, and disruption.

Organizations must adopt simulation approaches tailored to their specific needs. A risk-based approach targeting key systems and driving overall operational impact is essential. Insider threats must be considered, as disgruntled employees may have access to data and environments, and knowledge to circumvent protections. The interconnectivity of the Internet can also enable threat actors to exploit connected systems.

 

Skilling and Resource Allocation for Threat Intelligence and Vulnerability Management

The first step in awareness is receiving alerts and notifications from credible industry and government agencies regarding threats, vulnerabilities, and other cyber events. Trusted sources like CISA (www.cisa.gov) and (www.ic3.gov) for filing complaints provide valuable information and resources. Allocating skilled resources for effectively receiving and processing vulnerabilities is crucial. Organizations with in-house or managed services Security Operations Centers (SOC) gain an advantage by efficiently triaging numerous alerts, ensuring vulnerabilities are promptly addressed by appropriately skilled personnel.

 

Business Email Compromise

Phishing remains a primary threat and source of compromise. An effective phishing strategy is an essential component of an organization's education program. Customizing phishing simulations to target organizational dependencies, known weaknesses, and varying tactics enhances effectiveness. Providing immediate feedback and training upon clicking a phishing test link strengthens educational value and overall resilience. Utilizing platforms and services to benchmark organizational click rates and trends, as well as report rates, provides insights into employee behavior towards potential phishing emails.

 

Communication During Outages

During major outages, out-of-band communication is crucial. Ensuring offline or basic communication methods, including mobile devices and printed materials, are updated and readily available is vital for maintaining operations and ensuring continuity. Trust and transparency are essential, so pre-established communication protocols, timely and accurate updates, and multi-channel communication involving organizational stakeholders must be maintained. Coordinated efforts leveraging emails, social media, hotlines, and other approved messaging channels are essential.

Organizations should adhere to their specific cyber Incident Response Plan (IRP) to ensure alignment with internal procedures, communication protocols, and legal, regulatory, and contractual requirements. Unified communication across member relations, public relations, and public affairs teams is also crucial. Additionally, risk management, insurers, Internal Audit, and forensics teams (or retainers) play significant roles from identification through triage and recovery.

 

Threat Actor General Philosophy

Leave communication and negotiations with threat actors to experts. Each organization should have specific rules of engagement, including external services that act on its behalf. Internal communication must keep employees informed, provide additional instructions, and establish clear expectations. It is vital to set expectations and explain that information can change quickly during a security incident. While communication cadence should be established, anticipate on-the-fly updates due to significant changes.

During triage and post-incident phases, disconnection from parts of the organization, including business partners and vendors, may be necessary. Decisions on when to reconnect should be based on resolution assurance, normal-state restoration, and verification by a Letter of Containment.

 

Benefits and Common Practices

Similar to Business Continuity Planning (BCP) and Disaster Recovery Planning (DRP) simulations, which have been in place for many decades, tabletop exercises identify gaps and provide improvements and efficiencies in current processes. They familiarize participants, from technicians to executive leadership, with common challenges, reaffirm roles, and address top organizational risks. Additionally, organizations should consider who to contact in conjunction with their IRP, including law enforcement and relevant authorities.

Monday, July 22, 2024

Strengthening Tabletop Exercise From CrowdStrike’s Incident (Expanding Insights)

The CrowdStrike software outage on July 19th underscored critical vulnerabilities within our digital infrastructure and highlighted key areas for enhancement in our IT and security strategy.

 

Triggered by a faulty update to CrowdStrike's Falcon Sensor, the outage impacted multiple sectors globally, emphasizing the necessity for robust testing and staged rollouts of software updates. This incident serves as a stark reminder that, despite their computing power, our technology solutions are not infallible and depend heavily on human diligence. Mistakes can happen, and it is crucial to learn from these events to prevent future occurrences. Rigorous testing of updates can prevent disruptions and ensure stability; however, delaying updates increases the risk of zero-day threats, where vulnerabilities are exploited before patches are applied.

 

This event also reinforces the importance of comprehensive disaster recovery (DR) plans, which must be regularly tested to ensure rapid response and recovery. Relying heavily on a single vendor can pose significant risks. To mitigate such vulnerabilities and enhance supply chain resilience, organizations must diversify their suppliers and establish robust backup systems.

 

CrowdStrike's transparent communication during the outage, including detailed technical explanations and remediation steps, was pivotal in maintaining customer trust. This incident highlights the need for rigorous testing protocols, including regression testing and staged rollouts, to ensure updates are thoroughly vetted before deployment.

 

Every incident presents an opportunity for growth. This event exemplifies the importance of regular cybersecurity breach exercises to test our incident response capabilities and identify gaps in our plans. Investment in proactive monitoring tools and rapid response competencies is essential to quickly detect and address issues that contain impact.

 

Following this incident, we anticipate a thorough root cause analysis by CrowdStrike to understand how the logic error in the software update occurred. This analysis should include a review of the development, testing, and deployment processes. Collaboration with partners such as Microsoft, AWS, and Google Cloud Platform is vital to develop scalable solutions and expedite remediation efforts. Providing detailed technical guidance and support to affected customers, including remediation documentation and direct assistance from engineers, is crucial. By implementing these changes and recommendations, we can enhance our resilience and reduce the likelihood of similar incidents in the future. 

 

Never let an incident go to waste... Let us prioritize these actions to safeguard our infrastructure and maintain the trust of our customers and stakeholders.


Friday, July 19, 2024

CrowdStrike Microsoft IT Security Event

Crowdstrike and Microsoft discussion starters - July 19th.

A perspective on the business interruption that occurred today, impacting operations globally. 

First, it is essential to emphasis that this was not a cyberattack, was not caused by a threat actor, and did not compromise confidential information, based on the current information available. However, the incident did affect system availability, one of our core security principles and cybersecurity incident criteria: Confidentiality, Integrity, and Availability.

While such occurrences are rare, they highlight the growing complexity and risks associated with centralized infrastructure and remote computing. At approximately 1 AM on July 19th, a CrowdStrike channel update rendered computing equipment (servers and workstations) unusable, freezing systems at the Microsoft Windows boot-up stage (blue screen). This incident likely resulted from inadequate testing or deployment processes before releasing content updates. Industry speculation include potential failures within the CI/CD (Continuous Integration and Continuous Delivery/Deployment) pipeline/process among other faults. Balancing the timely release of threat blockers with appropriate testing is a critical risk-and-reward consideration.

The situation was exacerbated by the presence of Microsoft BitLocker encryption on workstations, which, while protecting the systems, hindered alternative recovery processes without the decryption key. These keys are securely managed by system administrators, requiring skilled IT professionals to recover and apply corrective measures.

This incident underscores the need to explore zero-dollar retainers for IT services, enabling firms with Subject Matter Experts (SMEs) to support help desk recovery efforts effectively. It also highlights the importance of robust Business Continuity Planning (BCP) and Out Of Band (OOB) communication capabilities, ensuring business resilience and risk mitigation.

In light of these events, revisiting contract management and Service Level Agreements (SLAs) with partners and supply chain entities to understand obligations. As a leading entity in the Endpoint Detection and Response (EDR) security space, CrowdStrike's influence is far-reaching. As an alternative, switching to a different EDR platform involves significant reconfiguration, software installation, and upskilling of Security and IT teams. Weighing the change impact, balancing benefits with overall risk against is not a trivial effort.

The validation of root causes and other lessons will unfold, along with potential SEC filings, negligence lawsuits, and other ramifications. It is imperative that we, as an industry, demand better performance and reliability from CrowdStrike and our partners.