Search Results

8 items found for ""

Bridging the Gap: Understanding the Role of Synthetic Data in Computer Vision
As artificial intelligence and computer vision continue to advance, synthetic data has emerged as a powerful tool in the creation of robust, high-performing models. However, there's a common misconception: synthetic data needs to be photorealistic to be effective. In reality, the central challenge of using synthetic data isn't about photorealism—it's about capturing the complexity and variety of the real world. Synthetic Data: Photorealistic or Not? When it comes to training computer vision models, generalization is key. The model should be able to apply what it has learned from its training data to new, unseen data. If your goal is to create a model that can generalize well to real-world images, having synthetic data that closely resembles real-world data (i.e., photorealistic data) can indeed be beneficial. But photorealism isn't always necessary. In some domain-specific tasks—like training a model to identify geometric shapes or patterns—photorealism may be less important than accurate shape representation. Even non-photorealistic synthetic data can serve to augment a dataset by creating variations of existing images through changes in lighting, orientation, or other aspects. This can significantly enhance model robustness, making it adaptable to a wider range of situations. For example, see this project we, the Lexset team, recently published on the NVIDIA Developers blog. https://developer.nvidia.com/blog/better-together-accelerating-ai-model-development-with-lexset-synthetic-data-and-nvidia-tao/ The Sim-to-Real Challenge Synthetic data offers tremendous advantages to any computer vision team. However, it does present one notable challenge—the so-called "sim-to-real" transfer problem. The crux of this issue is not about how photorealistic the synthetic data is but rather about how well it captures the variability and complexity of the real world. As we will explain, If someone knows how to work well with synthetic data, this isn’t so much a challenge as it is an opportunity. If the synthetic data fails to adequately represent the diverse sets of scenarios, objects, lighting conditions, and various aspects of a domain that a model might encounter in the real world, the model might perform poorly when applied to real-world data, even if it performs well on synthetic data. This is the sim-to-real transfer problem. This issue is not exclusive to synthetic data. You could see similar behavior with any dataset that fails to adequately represent the variety and complexity of the real world after being deployed. For instance, an object detection model intended to count people walking through a gate trained exclusively on perfectly sunny synthetic images may struggle when it encounters real-world fog or rain. Even if the synthetic data was photorealistic, the model might fail because it hasn't been exposed to the range of weather conditions it needs to handle. Overcoming the Sim-to-Real Transfer Problem Practitioners are now overcoming the sim-to-real problem, and we’re tracking some of the best techniques: Domain Randomization: This approach involves creating synthetic data with a high degree of variability—different lighting conditions, object orientations, textures, and more. The idea is that by exposing the model to a wide variety of conditions, it can learn to generalize better to the real world. Citation: Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). –https://arxiv.org/pdf/1703.06907.pdf Domain Adaptation: This method aims to reduce the distribution gap between the synthetic and real-world data. Techniques like style transfer or feature-level alignment are used to make synthetic data more similar to real data and vice versa. Citation: Ganin, Y., & Lempitsky, V. (2015). Unsupervised Domain Adaptation by Backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1180-1189. –http://proceedings.mlr.press/v37/ganin15.html Sim-to-Real Fine-tuning: In this approach, models are initially trained on synthetic data and then fine-tuned on a small amount of real-world data. This allows the model to benefit from the large-scale synthetic data while adapting to the peculiarities and nuances of the real world. This method leverages the advantage of synthetic data's abundance and diversity while also tapping into the authenticity of real-world data. By combining both, the models are capable of achieving robust performance. Citation: James, S., Davison, A., & Johns, E. (2020). Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task. In Proceedings of the Conference on Robot Learning (CoRL). –https://arxiv.org/pdf/1707.02267.pdf In each of these techniques, the key focus is not merely on creating photorealistic synthetic data but rather on creating synthetic data that can encapsulate the vast array of situations, objects, lighting conditions, and more than models might encounter in real-world environments. By doing so, these methods strive to bridge the sim-to-real gap and build models that can successfully navigate the complexity and variability of the real world. Wrapping Up In conclusion, while synthetic data's photorealism can be crucial in some contexts, it's not a silver bullet for model training in computer vision. The key challenge with synthetic data is capturing the myriad scenarios and complexities of the real world, the failure of which leads to the sim-to-real transfer problem. Techniques like domain randomization, domain adaptation, and sim-to-real fine-tuning are paving the way to surmount this challenge, promising a future where synthetic data can effectively fuel the development of robust, real-world-ready models. So, as we move forward in the exploration of artificial intelligence and its potential, it's essential to not just strive for photorealism but, more importantly, to encapsulate the richness of real-world variability in our synthetic datasets.
Lexset is Hiring a Head of Sales & Marketing!
Head of Sales & Marketing Gig Harbor, WA, USA (remote/flexible) Company Overview: Artificial Intelligence is transforming the world. The subset of AI that gives computers the ability to understand what they see, Computer Vision, is exploding in use in applications from security to manufacturing. However, obtaining the training data needed to create these applications is slow, expensive, and often yields poor results. Lexset software delivers Computer Vision teams the data they need when they need it, in a scalable and repeatable fashion. Lexset is growing at an incredible pace, already counting some of the world’s best AI companies as customers. We’re looking for the right person to join us to accelerate our growth at this exciting time! Role: As the Head of Sales & Marketing you will be responsible for defining revenue targets and executing a data-driven go-to-market strategy for Lexset’s SaaS platform. You will work directly with the Co-Founder and COO to deliver results. You will be an important leader in the company, joining on the ground floor of a rapidly growing company. You will join a diverse team and flexible work environment. You will enjoy a mix of work from home and in-office interactions. You will lead team(s) of people to execute on key growth strategies. You will help with some brand marketing, but revenue will be your focus. Responsibilities: Develop and execute on the go-to-market, including KPI’s and revenue targets. Set clear objectives and goals, research and define target audiences, develop marketing and communication strategies, and measure adoption. Collaborate and lead across the organization. Represent the voice of the customer within the organization, bringing your insights to teams across product, design, user experience, engineering, and executive leadership. Leadership. Lead and mentor a growing team of marketers across a broad set of functions including product marketing, partner marketing, content and brand. Deepen relationships with key partners. Nurture existing relationships with key partners, aligning on mutual goals and driving adoption through a variety of co-marketing efforts. Drive the evolution of our brand positioning. Bring the brand to life in close consultation with Founders, Deliver a clear and consistent brand that resonates with and motivates our target audiences. Press and PR. Promote the company as an innovator to the press and public and oversee outbound customer-facing communications across our website, blog, and social channels. Prepare and manage monthly, quarterly and annual budgets for the Marketing department Set, monitor and report on team goals Analyze consumer behavior and determine customer personas Identify opportunities to reach new market segments and expand market share Requirements: Experience as Head of Marketing or Sales or VP Marketing or Sales preferably in AI and/or data sales preferably >5 years industry experience Experience running successful marketing campaigns Must be a great written and verbal communicator Leadership skills with the ability to set and prioritize goals Knowledge of finance, especially expense management and profit and loss statements Experience with web analytics, Google Adwords, and similar Experience with CRM software Enjoy working hard Bachelor’s degree in Marketing or relevant field MBA or similar prefered Applicants must be currently authorized to work in the United States on a full-time basis.” Contact: info@lexset.ai *We value diversity at Lexset. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.
Lexset is Hiring!
Company Overview: Artificial Intelligence is transforming nearly every industry and how we interface with computers. Obtaining good training data is one of the biggest bottlenecks in creating great Artificial Intelligence. Visual data for computer vision applications is some of the most difficult. Current methods for obtaining and labeling training data are slow, expensive, and inaccurate. Lexset uses simulation and procedural 3D content to deliver AI teams the data they need when they need it in a scalable and repeatable fashion. Lexset is solving one of the biggest problems in AI, building the world’s first simulation solution specially built for the needs of artificial intelligence teams. Role: This is a role for an engineer and builder who will develop features and functionality for Lexset’s synthetic data platform and maintain strong relationships with our customer’s technical teams. The candidate should love solving complex problems and be very detail-oriented. This person should be as excited about building great technology as they are about communicating complex ideas visually and verbally. In this role, you will be responsible for understanding our customers needs, understanding how they use our products, and helping our customers integrate our products into their workflows. This engineer will also implement new features as needed to support customer applications. Interact with customers daily to understand their pain points and design solutions. Help customers troubleshoot and debug simulations to generate synthetic data. Design, develop and deploy new features to our synthetic data platform. Manage the production of 3D Content. This engineer should be passionate about 3D graphics and capable of working with various 3D content production tools. Responsibilities: Being a Synthetic Data Engineer at Lexset means you are responsible for ensuring that our customers are successful with our products and enabled to generate the types of datasets they need. You will perform integrations and design systems to help our customers achieve their goals. This position will also be responsible for educating customers about the value and benefits of working with synthetic data and helping create tutorials and educational materials. Collaborate with other members of the engineering team to integrate customer feedback into the product Communicate complex solutions to customers both visually and verbally Oversee the creation of educational materials that help people better understand the benefits of Lexset products and how synthetic data can be used to improve the development of artificial intelligence. Produce and manage the production of 3D content. Write code and collaborate with engineering team members to design and implement new features as needed. Requirements: Strong engineering background, preferably in Computer Graphics, Mathematics, or Computer Science Experience working with both technical and non-technical partners and customers Proficiency in languages such as Javascript, Python, or similar languages Experience building and deploying web applications Proficiency with a suite of 3D modeling and 3D rendering tools Proficiency with docker and Kubernetes Solutions-oriented mindset and not afraid of hard problems and ambitious projects Excellent problem-solving skills and the ability to learn new subject matter quickly We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.
New in Seahaven: Realistic HDRI backgrounds
There’s a great new feature available now in Seahaven: realistic HDRI background controls. HDRI images are a powerful way to utilize great photorealistic imagery. Traditionally, HDRI images are used as a lighting source, but we have added additional controls to use your HDRIs as backgrounds and composite your 3D assets into the images. Now you can use HDRIs to quickly generate photorealistic synthetic data. How HDRI backgrounds work HDRI images are traditionally mapped to a sphere and are very useful in providing realistic lighting and reflections from real-world contexts. However, mapping onto a sphere can sometimes produce unrealistic effects, leaving objects floating in space or appearing out of scale. To give our users more control over how these HDRIs are mapped to your environment we added a few new options to our HDRI loader module. In addition to being able to select from a set of pre-populated HDRI types, you can now project the lower half of the sphere to the ground plane. This helps overcome many of the issues associated with spherical projection, such as floating or out of scale items. We also added a height parameter. Height refers to the height of the camera when the HDRI was captured. Most HDRI images are taken between 2 and 3 meters off the ground. You can adjust this parameter to reduce distortion around the horizon which might appear if your camera moves too high above eye level in the z axes. To help you on your path experimenting with these features we have prepared a sample configuration to get you started. Download the sample workflow Sample Placement Rules Sample 3D Asset Please note that the size of the HDRI will influence your output. For example, in this demonstration, we have used a set of HDRI images called “skies.” These images are typically sampled from wide open spaces with few obstructions. Other collections, such “as indoor,” will be representative of tighter spaces. If you place your 3D assets far away from the camera, the resulting images will not be perfectly composited. Please be mindful of this when you build relationships and select your collections. If you have any issues adapting this sample to your needs our team of engineers would be happy to work with you! Reach out at info@lexset.ai. If successful your output will look similar to this.
Using Seahaven: Creating the Dataset
In this tutorial, we will be working on a simple object detection dataset. We are creating a dataset that can be used to train a model to distinguish four similarly shaped screws against different backgrounds. We have a small set of real world data and we are looking to create new data using a simulation where the four different screws are dropped on different surfaces. We will use this simulation to create a training set for an object detection algorithm to identify specific screws on different backgrounds. You can download a version of the tutorial dataset here and the tutorial relationship and simulation files here. For a more detailed description of the individual modules and how to use them, please refer to our documentation. Prepare your 3D assets in Blender Login to your Lexset account Import 3D Assets The 3D Assets tab is where you will bring all of the 3D assets you will use in your simulation into Seahaven. Here, assets are separated into folders called collections. Each collection should be composed of files you would wish to sample from. The collections for this tutorial are pre-populated in your Lexset account. You will see four collections titled screw_1 screw_3 screw_6 screw_8 with the corresponding blend files located in each collection. The Collection is available for placement in the Relationship Editor Create a color map From the 3D Assets tab you will next be moving onto the Colormap Editor tab. In Seahaven you must define a color map before you start generating data. For this dataset we made four categories: one for each of each type of screw we want to annotate in our dataset. Relationship Editor The relationship file for this tutorial is pre-populated in your account and is available for use in the Load Relationship module in your simulation as DropScrewsToFloor.yaml. You can also download them from github. To make your own relationship file: Navigate to the Relationship Editor tab. Create a relationship workflow: From the Workflow menu, select the Workflow and bring it into the work space. Next, bring in the Relationship module. The Relationship module is used to place collections of 3D assets into the scene and to define their relationship to other objects using a parent/child system. Bring your Relationship module into the Workflow bay. Define a relationship: The top two drop down menus in this module are used to define the parent/child relationship. In the box on the left select the name of the parent object, and the box on the right should have the name of the child object. For this dataset you will want to pair each collection of screws with a ground plane, where the ground plane is the parent and the screw collection is the child. Configure the defined relationship: The Placement bay is where you will be placing the Placement modules, which specify the configuration of the relationship between the parent and child object. Use item count to designate the range of how many instances of a given 3D model may appear in a scene. Use a min/max value to determine how many screws to be sampled from the collection. Suggested amount is 1-4 Bring the Locate module into the Placements bay. This module is used for designating the range of placement in the scene via a vector. Place the screws a small distance away from the ground plane as they can be dropped with the Drop module. Other modules you may want to use here are: Rotate: use the rotate module to give the screws a rotation range. We selected a range of 0 to 360 degrees Drop: This will ray-cast the screws onto the z-axis, allowing them to sit correctly on the ground plane. Copy and paste the relationship module for the other screws in the dataset. The pasted relationship will have the same configuration, so you will only need to change the parent and child objects. Simulation Editor Navigate to the Simulation Editor tab. Here you will use the 3D assets and relationships you’ve just made to create the scene, define the camera and its positioning, and determine the annotation outputs you wish to include in your dataset. This workflow is separated into five bays: Scene, Relationships, Camera, colormap, Additional Output and Resolution. You will use each of these to define the makeup of your dataset In the Scene bay, bring in the ground plane and lighting. Here we will be using the HDRI module and the Ground Plane module. The HDRI module creates an HDRI to use for background and lighting, while the Ground Plane module generates a ground plane for our objects to rest on. Our ground plane is set to 10m, and we will be using the “Abandoned” HDRI from the drop down menu. Use the Relationship module in the Relationships bay to select the relationship you created earlier and load into the simulation. Next define your camera using the Camera bay. For this dataset we will be using the Cubic Translation module to give a small amount of translation and rotation to our camera around a 3D bounding box. Be sure not to allow your camera to translate larger than a selected ground plane. Under properties bay, bring in the FOV module to define the camera's field of view. For this dataset we set it to .5. Under the Color Map bay, bring in the Color Map module from the Output menu. Select the colormap you made previously. The additional output bay is where you will place other optional types of outputs not included in standard COCO format. Today, you will find a Depth Map module you can add. For other available outputs contact Lexset at info@lexset.ai. Select the desired pixel resolution for your rgb data. For this dataset we are choosing a 512px resolution: Name your dataset, give it a description, and choose the number of datapoints you wish to output (between 1 and 5000) for this dataset we are creating 1000. Select create simulation: Navigate to the Simulation Manager tab. Here you will see your simulation in the queue: Click the play button to start your simulation. By clicking the eye icon you will now be able to see the simulation running and generating your dataset. You can download the dataset here. This dataset was created for training an object detection model to accurately detect four separate screws. This is one small example of the many ways Lexset synthetic data can be created and used. Similar datasets can be created for object detection models for all sorts of tasks like production line object orientation, pedestrian detection, object counting and many more. Lexset also creates complex data from human beings to satellite imagery using complex sensors. The data that Lexset creates is fast, high quality and customizable to the needs of your project or model. This ensures that engineers are able to get more accurate models by focusing on training rather than time consuming dataset acquisition. To learn more about dataset creation, please visit our documentation or reach out to us at info@lexset.ai.
Lexset announces Seahaven training data platform to accelerate vision AI applications using NVIDIA
Lexset’s Seahaven and NVIDIA TAO Toolkit accelerate transfer learning The biggest challenge to creating data-hungry computer vision systems is the acquiring, cleaning, and labeling of data. Today, data is collected and labeled all over the world in a slow, error-prone, and insecure process that looks more like the global supply chain for goods than for software development. By 2024, synthetically generated data will comprise 60 percent of the data used for AI development, according to Gartner Research. Lexset, a pioneer in synthetic data creation, announced the release of Seahaven, a new data generation platform. Scheduled for release later this month, it will give AI developers a powerful new tool in the race to revolutionize the data supply chain. Lexset’s Seahaven offers high-quality training data and eliminates delays associated with human-in-the-loop data sourcing and labeling. This rapid method of on-demand data delivery helps solve problems traditionally associated with collecting training data. With Seahaven, customers can quickly create high-quality training data that can be used with the NVIDIA TAO Toolkit to accelerate the creation of vision AI applications. Drawing from its massive 3D model library, Lexset uses procedural algorithms to create fully annotated synthetic datasets. The process makes it possible to generate datasets 12X faster than conventional methods, while matching or exceeding the performance of models trained on traditional data. Gathering data is just one of the steps in the model creation process. The next step involves creating a model that fits a given use, which can be a time-consuming process. NVIDIA TAO Toolkit is the CLI and Jupyter Notebook version of NVIDIA TAO, an AI model adaptation framework. With it, users can create custom, production-ready AI models for their use case in a fraction of the time without needing AI expertise. The toolkit leverages the power of transfer learning and applies the knowledge gained from solving one model to another in the same domain. With transfer learning, data scientists and engineers still need easy access to data to teach a model the nuances of the related problem. Because Lexset’s Seahaven enables on-demand creation of data, users can adapt AI models across domains in a mere fraction of the time. With Lexset’s Seahaven and the NVIDIA TAO Toolkit, users can go from a dataset to a trained model at high velocity. Rapid access to training data, coupled with transfer learning, helps companies of all sizes and in all stages of development to enter and compete in the global vision AI race by removing complexity and accelerating development. “We are excited to combine NVIDIA’s world-class GPU-accelerated hardware and model training tools with Lexset synthetic data and provide a great accelerant to companies across the industry, “ said Lexset CEO Francis Bitonti. Models trained with Lexset data often achieve more than 90 percent precision and recall scores and improve baseline performance by 15 percent when compared to models trained on real-world data alone. Lexset doesn’t use an army of highly trained technical artists to make datasets, but generates simulations algorithmically, increasing speed and diversity. Click here for Media Inquiries.
Synthetic Data is solving AI’s biggest problem
Artificial intelligence has the potential to change the way people interact with computer — ushering in a new era, improving quality of life, and enhancing productivity. But that future is not inevitable. While AI performance is improving steadily, the most important part of creating an AI system — the data — hasn’t been improved in years. There’s not yet a solution for the incredible time and energy AI engineers spend cultivating and labeling training data — the data used to teach AI. This means the AI industry is focused on incremental improvements to AI architectures, while the raw materials that comprise AI are still living in the last decade. Lexset has the solution — better raw material for AI. Enter Synthetic Data. Lexset generates synthetic data algorithmically, to a creator’s exact specifications. As a result, data is obtained quickly, engineered to combat model bias, and comes with complete and exact annotations. Creating similar datasets using traditional approaches is at best expensive and time consuming, and sometimes outright impossible. And while the processes used to create synthetic couldn’t be more different than collecting real world data, synthetic data is just like real-world data in that it has the same mathematical and statistical properties. You can use it to reflect the real world in AI systems, letting them train in a completely virtual world without the built-in problems of the real world. Problems with training data don’t stop at acquisition and labeling One of the biggest problems that pops up in AI data — the one that you’ve probably read the most about — is bias. When data isn’t labeled perfectly, or isn’t annotated well enough, it can lead to systemic discrimination. It’s something that we talked about earlier this month, which is already pervasive in existing datasets — and unfortunately, it’s not getting much better. Privacy issues have already caused existing datasets major problems, leading to deletion of millions of training images and redaction in others, making it harder and harder for an AI to learn from effectively. Synthetic data doesn’t inherently have these problems. Which is why, according to Forrester Research, synthetic data could lead to “AI 2.0,” making radical changes to improve AI. With Lexset’s synthetic data, you receive pixel-perfect annotation in images, effectively eliminating bias. Your AI will learn exactly what it’s supposed to, and won’t be misguided by broken metadata, mislabeled or unlabeled data points. And since the data is all artificial, there are no privacy concerns to contend with. Your AI won’t have to try to learn around redacted or edited real-world images. And maybe best of all, it’s faster and cheaper to generate synthetic data than it is to compile thousands and thousands of real-world annotated images. That's how Lexset can help. We’re leading the industry in synthetic data generation. Our data is always unbiased, completely accurate, and built to exactly what you need. Better still, it’s the easiest data in the industry to access with users making their own datasets with Lexset tools. Lexset allows you to rapidly iterate on your models, building dramatically improved accuracy in your AI. Putting it simply, using Lexset’s synthetic data, you can make sure your AI is doing the right thing, every time. Reach out to us to get your company started with our synthetic data!
Is your AI good? Not if your data is bad.
According to a recent study from MIT, 10 of the most commonly-used computer vision datasets have errors that are “numerous and widespread” -- and it could be hurting your AI. Data sets are core to the growing AI field, and researchers use the most popular machine-learning models to evaluate how the field is moving forward. These include image recognition data sets like ImageNet or MNIST, which focuses on the ability to recognize handwritten numbers between 0 and 9. But… the most popular AI data sets have some major problems. Even these popular and long-standing data sets have been shown to have major errors. For ImageNet, this included huge privacy problems when it was revealed that they had used people’s faces in their data set without their consent. MIT’s most recent study looks at a more mundane, but possibly more important, problem -- annotated data labels are completely wrong. A couch is labeled as a kayak, a house as a marmot, or a warehouse pallet as a Honda Civic. For ImageNet, the test set had an estimated error rate of over 5%, and for QuickDraw (a compilation of hand drawings), the rate was over 10%. How’d the MIT study work? The study used 10 of the most popular data sets with a matching set for validation. The MIT researchers basically built their own machine-learning model and used it to predict the labels in the testing data. Where they disagreed, that piece of data was flagged for review and five Amazon Mechanical Turk human reviewers voted on which label they thought was correct. What does this mean in the real world? The data people are using to train their AIs isn’t any good so their models are suffering. The research examined 34 models previously evaluated against an ImageNet test sets, and then evaluated them again against the 1,500 or so examples where the data labels were wrong. And it turned out that the models that didn’t do well on the original, error-filled data set performed among the best once those labels were fixed. And the simpler models, with correct data labels, performed far better than more complicated models used by giants in the industry like Google -- the things everyone assumes to be the best in the field. Basically, these advanced models are doing a bad job -- and it’s all because the data is bad. So how do I fix my data problems? Lexset synthetic data. Here at Lexset, we create synthetic training data for your AI -- and our data labels are always correct. You can create infinite datasets with pixel-perfect annotation, enabling rapid model iteration, advanced bias controls -- all leading to dramatically improved accuracy. Whether you’re trying to count, measure, monitor or all of the above, our synthetic data makes it easy to train your AI to do the right thing, every time. Ready to get started? Reach out now and we’ll have you on your way to perfect data annotations in no time. Note: The 10 datasets examined are MNIST, CIFAR-10 / CIFAR-100, Caltech-256, ImageNet, QuickDraw, 20news, IMDB, Amazon Reviews, AudioSet